ZFS offers a new compression method in the latest version: lz4. It claims to be better than lzjb. Since lzjb is pretty good already, I am curious to find out how good will lz4 be comparing to lzjb. According to my tests, lz4 is performing better than lzjb in terms of spacing saving and I/O, but not too much.
lz4 VS lzjb: Space Saving
Long story short, here is what I did. I set up two servers with brand new ZFS settings. One server is set to lzjb, and the other server is set to lz4. Both servers have exact the same hardware, software, operating system etc. Both of them are loaded with the same set of data (10TB) via rsync and rsyncd. These data are real data that I use everyday, which includes office documents (PDF, Word, Excel), photos(Raw and JPEG), music(mp3), video(mpeg), zip, source codes, binary applications, database (MySQL, Redis), webserver files (PHP, HTML, CSS), etc. Notice that some of these files are already compressed (such as jpeg, zip etc), and the file types are not evenly distributed (e.g., 40% of the files are jpeg, 25% are docs, 10% are zip, 5% are something else). I want to make a test in a more realistic scenario.
Also, all ZFS settings are set BEFORE loading the data. This will ensure that all data on the same server share the same compression settings. Below is the summary of the used space. Keep in mind that the space saving test is independent to the hardware, such as CPU type, memory, speed of the disks, etc. Just like running the same command to compress the same set of files in two systems. We expect the result will have the same size. The only difference will be the time spent on compressing the data.
#ZFS with lzjb df Filesystem 512-blocks Used Avail Capacity Mounted on storage/data 23007987368 22996620466 11366902 100% /storage/data #Used space: 22996620466 blocks = 10965.6 GB
#ZFS with lz4 df Filesystem 512-blocks Used Avail Capacity Mounted on storage/data 31527470077 22942284913 8585185164 73% /storage/data #Used space: 22942284913 blocks = 10939 GB
I found that for 10TB of data, the server with lzjb uses 26GB more space than lz4, which translate to 0.23% (26GB out of 10TB). The difference is quite small and not too significant. However, when your ZFS is nearly full or you are too busy to upgrade your system, the extra saving may matter a lot.
lz4 VS lzjb: Time Saving
I am also curious to know how much time I will save by switching from lzjb to lz4. Therefore I did another simple test. First, I generated a 1GB of random data and wrote to the lzjb system. After that, I destroyed the zpool and rebuilt the zpool with lz4. This will eliminate other factors such as hardware, network traffic etc, because both tests were done within the same system.
#ZFS with lzjb time dd if=/dev/random of=/storage/data/file.out bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 23.725955 secs (44195313 bytes/sec) real 0m24.144s user 0m0.024s sys 0m18.326s
#ZFS with lz4 time dd if=/dev/random of=/storage/data/file.out bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 22.589257 secs (46419234 bytes/sec) real 0m22.802s user 0m0.016s sys 0m18.273s
P.S. The /dev/random in FreeBSD is nothing like the one in Linux. It is very fast.
By just upgrading from lzjb to lz4, the time is reduced from 24.144 to 22.802 seconds, which translate to 5.5% (1.342s out of 24.144s) improvement.
This result also agrees with my overall experience: The improvement is small and not noticeable. In fact, I barely notice any performance improvement after switching from lzjb to lz4, which primarily includes transferring the files between Windows and FreeBSD-based ZFS system via Samba on a gigabit network. The I/O speed are about the same. However, if you are talking about a busy server with lots of traffic, a 5% improvement will be something.
Anyway, this is what I recommend. If lz4 is available in your ZFS version, use it. It can’t be wrong.
sudo zfs set compression=lz4 myzpool