How to Stress Test ZFS System

If you have set up a ZFS system, you may want to stress test the system before putting it in a production environment. There are many different ways to stress test the system. The most common way is to fill the entire pool using dd. However, I think scrubbing the entire pool is the best.

In case you are not familiar with scrubbing, basically it is a ZFS tool to test the data integrity. The system will go through every single file and perform checksum calculation, parity check etc. During scrubbing the entire pool, the system will generate a lot of I/O traffic.

First, please make sure that your ZFS is filled with some data. Then we will scrub the system:

sudo zpool scrub mypool

Afterward, simply run the following command to check the status:

#sudo zpool status -v
  pool: storage
 state: ONLINE
  scan: scrub in progress since Sun Jan 26 19:51:03 2014
        36.6G scanned out of 14.4T at 128M/s, 32h38m to go
        0 repaired, 0.25% done

Depending on the size of your pool, it may take few hours to few days to finish the entire process.

So how does the scrubbing related to the stability? Let’s take a look to the following example. Recently I set up a ZFS system which was based on 6 hard drives. During the initial setup, everything was fine. It gives no error or anything when loading the data. However, after I scrubbed the system, something bad happened. During the process, the system disconnected two hard drives, which made the entire pool unreadable (That’s because RAID1 can afford up to one disk fails). I was feeling so lucky because it didn’t happen in a production environment. Here is the result:

sudo zpool status
  pool: storage
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        storage                   UNAVAIL      0     0     0
          raidz1-0                UNAVAIL      0     0     0
            ada1                  ONLINE       0     0     0
            ada4                  ONLINE       0     0     0
            ada2                  ONLINE       0     0     0
            ada3                  ONLINE       0     0     0
            9977105323546742323   UNAVAIL      0     0     0  was /dev/ada1
            12612291712221009835  UNAVAIL      0     0     0  was /dev/ada0

After some investigations, I found that the error had nothing to do with the hard drives (such as bad sectors, bad cables etc). Turn out that it was related to bad memory. See? You never know what component in your ZFS is bad until you stress test it.

Happy stress-testing your ZFS system.

–Derrick

Our sponsors:

ZFS Compression: lz4 VS lzjb

ZFS offers a new compression method in the latest version: lz4. It claims to be better than lzjb. Since lzjb is pretty good already, I am curious to find out how good will lz4 be comparing to lzjb. According to my tests, lz4 is performing better than lzjb in terms of spacing saving and I/O, but not too much.

lz4 VS lzjb: Space Saving

Long story short, here is what I did. I set up two servers with brand new ZFS settings. One server is set to lzjb, and the other server is set to lz4. Both servers have exact the same hardware, software, operating system etc. Both of them are loaded with the same set of data (10TB) via rsync and rsyncd. These data are real data that I use everyday, which includes office documents (PDF, Word, Excel), photos(Raw and JPEG), music(mp3), video(mpeg), zip, source codes, binary applications, database (MySQL, Redis), webserver files (PHP, HTML, CSS), etc. Notice that some of these files are already compressed (such as jpeg, zip etc), and the file types are not evenly distributed (e.g., 40% of the files are jpeg, 25% are docs, 10% are zip, 5% are something else). I want to make a test in a more realistic scenario.

Also, all ZFS settings are set BEFORE loading the data. This will ensure that all data on the same server share the same compression settings. Below is the summary of the used space. Keep in mind that the space saving test is independent to the hardware, such as CPU type, memory, speed of the disks, etc. Just like running the same command to compress the same set of files in two systems. We expect the result will have the same size. The only difference will be the time spent on compressing the data.

#ZFS with lzjb
df
Filesystem    512-blocks        Used     Avail Capacity  Mounted on
storage/data 23007987368 22996620466  11366902   100%    /storage/data

#Used space: 22996620466 blocks = 10965.6 GB
#ZFS with lz4
df
Filesystem    512-blocks        Used      Avail Capacity  Mounted on
storage/data 31527470077 22942284913 8585185164    73%    /storage/data

#Used space: 22942284913 blocks = 10939 GB

I found that for 10TB of data, the server with lzjb uses 26GB more space than lz4, which translate to 0.23% (26GB out of 10TB). The difference is quite small and not too significant. However, when your ZFS is nearly full or you are too busy to upgrade your system, the extra saving may matter a lot.

lz4 VS lzjb: Time Saving

I am also curious to know how much time I will save by switching from lzjb to lz4. Therefore I did another simple test. First, I generated a 1GB of random data and wrote to the lzjb system. After that, I destroyed the zpool and rebuilt the zpool with lz4. This will eliminate other factors such as hardware, network traffic etc, because both tests were done within the same system.

#ZFS with lzjb
time dd if=/dev/random of=/storage/data/file.out bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 23.725955 secs (44195313 bytes/sec)

real    0m24.144s
user    0m0.024s
sys     0m18.326s
#ZFS with lz4
time dd if=/dev/random of=/storage/data/file.out bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 22.589257 secs (46419234 bytes/sec)

real    0m22.802s
user    0m0.016s
sys     0m18.273s

P.S. The /dev/random in FreeBSD is nothing like the one in Linux. It is very fast.

By just upgrading from lzjb to lz4, the time is reduced from 24.144 to 22.802 seconds, which translate to 5.5% (1.342s out of 24.144s) improvement.

This result also agrees with my overall experience: The improvement is small and not noticeable. In fact, I barely notice any performance improvement after switching from lzjb to lz4, which primarily includes transferring the files between Windows and FreeBSD-based ZFS system via Samba on a gigabit network. The I/O speed are about the same. However, if you are talking about a busy server with lots of traffic, a 5% improvement will be something.

Anyway, this is what I recommend. If lz4 is available in your ZFS version, use it. It can’t be wrong.

sudo zfs set compression=lz4 myzpool

–Derrick

Our sponsors:

ZFS Performance Boost/Improvement: How I push the I/O speed to 126MBps

This article is part of my main ZFS tutorial: How to improve ZFS performance. That article covers everything you need. If you already have the basic knowledge, or you just want to know how I push the I/O speed to 120+ MBps, you can skip that article and read this one.

Long story short, here is the result of my iostat:

sudo zpool iostat -v

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.85T  12.1T      1  1.08K  2.60K   126M < ---126 MBps!
  raidz2    5.51T  8.99T      1    753  2.20K  86.9M
    ada0        -      -      0    142    102  14.5M
    ada1        -      -      0    143    102  14.5M
    ada3        -      -      0    146    307  14.5M
    ada4        -      -      0    145    409  14.5M
    ada5        -      -      0    144    204  14.5M
    ada6        -      -      0    143    204  14.5M
    ada7        -      -      0    172    409  14.5M
    ada8        -      -      0    171    511  14.5M
  raidz1    2.34T  3.10T      0    349    409  39.5M
    da0         -      -      0    169    204  13.2M
    da1         -      -      0    176      0  13.2M
    da2         -      -      0    168    102  13.2M
    da3         -      -      0    173    102  13.2M
----------  -----  -----  -----  -----  -----  -----

I measured the speed while I was transferring 10TB of data from one FreeBSD-based ZFS server to another FreeBSD-based ZFS server, over a consumer-level gigabit network. Basically, every components I used are consumer-level. The hard drives I used are standard PATA and SATA hard drives (i.e., not even the light speed SSD hard drive). The data I transferred are real data (rather than all zeros or ones generated through dd).

First of all, here are the settings I used. Both server and client have similar configurations.

sudo zpool history
History for 'storage':
2014-01-22.20:28:59 zpool create -f storage raidz2 /dev/ada0 /dev/ada1 /dev/ada3 /dev/ada4 /dev/ada5 /dev/ada6 /dev/ada7 /dev/ada8 raidz /dev/da0 /dev/da1 /dev/da2 /dev/da3
2014-01-22.20:29:07 zfs create storage/data
2014-01-22.20:29:15 zfs set compression=lz4 storage
2014-01-22.20:29:19 zfs set compression=lz4 storage/data
2014-01-22.20:30:19 zfs set atime=off storage
#where the ad* and da* are nothing more than standard SATA drives:
#ZFS Drives
#8 x 2TB standard SATA hard drives connected to the motherboard
#4 x 1.5TB standard SATA hard drives connected to a PCI-e RAID card

da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes)
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada6: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada7: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada7: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada8: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada8: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)


#System Drive
#A PATA 200MB drive from 2005
ada2: maxtor STM3200820A 3.AAE ATA-7 device
ada2: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
ada2: 190782MB (390721968 512 byte sectors: 16H 63S/T 16383C)

Here are the corresponding hardware components:

#CPU
hw.machine: amd64
hw.model: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
hw.ncpu: 8
hw.machine_arch: amd64


#Memory - 3 x 2GB
real memory  = 6442450944 (6144 MB)
avail memory = 6144352256 (5859 MB)

#Network - Gigabit network on the motherboard / PCI-e
#If your gigabit network card is PCI, that will be your bottleneck.
re0: realtek 8168/8111 B/C/CP/D/DP/E/F PCIe Gigabit Ethernet

Other than that, I didn’t really specify any special settings in my OS. Everything else are default:

#uname -a
FreeBSD 9.2-RELEASE-p3 FreeBSD 9.2-RELEASE-p3 #0: Sat Jan 11 03:25:02 UTC 2014     [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64

The way how to transfer 10TB of data is through rsync and rsyncd. First of all, I set up a rsync as daemon on the server. Keep in mind that rsync as a daemon (service) is not the same as rsync. If you want to know how I set up the rsyncd, please visit How to Improve rsync Performance.

After you have set up the rsyncd on the server, I run the following on the client.

rsync -av server_ip::rsync_share_name /my_directory/

#Example
rsync -av 192.168.1.101::storage /mydata/

Notice that I didn’t enable the compression option. That’s because my files are pretty much compressed (jpeg, zip, tar, 7z etc). If your file types are different, you may want to enable the compression, i.e.,

rsync -avz server_name::rsync_share_name /my_directory/

Give it a try and see whether it improves the I/O speed.

Here are few things I have learned to improve the ZFS performance.

Keep the ZFS structure small and simple. For example, keep the number of disks in your vdev small. Previously, I set up a giant big pool with 14 disks RAIDZ in one single vdev. This is a bad idea. There is a maximum number of disks for RAIDZ. That’s why I split my disks into two groups:

#Group 1 - RAIDZ2
#8 x 2TB standard SATA hard drives connected to the motherboard

#Group2 - RAIDZ
#4 x 1.5TB standard SATA hard drives connected to a PCI-e RAID card

zpool create -f storage myzpool group1 raidz group2

Keep the ZFS settings clean. Enable the only thing you need, and disable the junk settings. In my case, I enable the best compression (lz4) and disable the access time:

2014-01-22.20:29:15 zfs set compression=lz4 storage
2014-01-22.20:30:19 zfs set atime=off storage

Keep the ZFS clean. I see the performance gain on a brand new ZFS than a few years old ZFS, even both of them contain the same data. Unfortunately, there is no easy way to clean the ZFS. The only way is to destroy the current ZFS first, create a new one and bring the data back. Typically, it takes about 2 days to transfer 10TB of data over a gigabit LAN. So it is really not too bad to spend about 5 days to rebuild your ZFS.

That’s it. Again, if you want to learn about tricks, please visit How to improve ZFS performance for more information.

–Derrick

Our sponsors:

How to Remote Desktop To Linux From Windows Using XRDP

If you are looking for ways to remote desktop to Linux from Windows, and you are sick of VNC, you are in the right place.

I have been looking for a solution to do something very simple. I want to remotely access the desktop of my Linux servers from Windows machine. Of course, it must be at a usable level.

I’ve tried different applications before, incluing XMing, XManager (X11 forwarding), VNC, 2X, NoMachine, ranging from open-source to commercial. Unfortunately, they offer nothing close to the Windows to Windows Remote Desktop experience.

Why XMing is bad?

XMing allows you to run a remote application (e.g., Firefox) on your desktop. It works great as long as you keep the session available. Once you close your session, there is no way to go back to the previous session. For example, suppose I am browsing a website using the Firefox on the remote server via XMing. If I decide to close the XMing (without closing the browser), there is no way for me to go back to the page I left off later. Of course, in reality, it doesn’t make too much sense to use a Firefox on a remote server. I was referring to some software that takes days to run, such as R, Matlab etc.

By the way, it is an application level, meaning that it does not work with the desktop level.

Why VNC is bad?

VNC is nearly perfect. It offers something similar to Windows Remote Desktop, i.e., I can access to the desktop level. If I work on something, I can get my previous session back even my session is closed. There are only two disadvantages: Slow, and not for multiple users.

VNC uses a stone-age protocol to delivery the remote desktop experience: Screen capture. When there is a change on the remote server, it takes a screen shot and send it to the client. You can imagine how slow it will be over the Internet. Although they released a better algorithm, which works directly at the graphic card level, i.e., only send the modified area to the client instead of everything. However the new protocol is still far from usable. The Microsoft Remote Desktop just blows VNC far far away.

Another thing I don’t like VNC is that it is at the machine/server level, not the user level. VNC uses its own user authorization. When you set up a VNC service, you define your own password. That’s it. In the other words, if I open a desktop on the server as derrick. Everybody will be able to see my desktop content via VNC as long as they enter the correct password. Unless I go to the server physically and switch to a different user, otherwise there is no way to protect the content of my desktop.

The good thing about VNC is that it is cross-platform. It works on OS X, Windows, Linux etc. Wait a minute. Java is cross-platform too. Do you want to use it?

Why XManager is bad?

XManager is better than VNC. It uses some special protocol such that the speed is not an issue. However, it is suffered in the session problem like XMing. If I remote desktop to the server using XManager, and I close my session later, there is no way for me to go back to my previous sessions. In the other words, I will lose all opened applications and contents.

And no, the session is still running silently on the server. They are just orphans. XManager doesn’t allow you to connect to the old session when you try to establish a new connection.

Solution: xrdp

I am so happy that I finally found something useful and usable. That’s xrdp. Here is a step-by-step tutorial on how to install it on Fedora/CentOS Linux. The instruction will be similar to Debian/Ubuntu Linux. The set up time should take no more than 5 minutes.

#Install xrdp
sudo yum install -y xrdp

If your Linux is too old, or your default repository does not have xrdp available, you may want to run the following:

#Don't forget to use the right (i686 or x86_64) architecture.

sudo rpm -Uvh  http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo rpm -Uvh  http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
sudo yum update -y

sudo yum install -y xrdp

Next, we want to start the xrdp service

sudo /etc/rc.d/init.d/xrdp start

You may want to enable this service during the boot up:

sudo chkconfig  xrdp on

Don’t forget to open the port 3389 in your firewall:

sudo nano /etc/sysconfig/iptables

-A INPUT -m state --state NEW -m tcp -p tcp --dport 3389 -j ACCEPT

Don’t forget to restart the firewall:

sudo service iptables stop
sudo service iptables start

Now, run the remote desktop on Windows, and enter the IP address of the server to see what happen.

–Derrick

Our sponsors:

[FreeBSD]mount: /dev/da0p2: R/W mount of / denied. Filesystem is not clean – run fsck. Forced mount will invalidate journal contents: Operation not permitted

When I tried to mount a hard drive on FreeBSD today, I got the following error message.

sudo mount -t ufs /dev/da0p2 /mnt/
mount: /dev/da0p2: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operation not permitted

Here is how to fix it:

sudo mount -r -t ufs /dev/da0p2 /mnt/

where ufs refers to the file system of FreeBSD.

That’s it.

–Derrick

Our sponsors: