How to Expand ZFS

Today I am going to share my story with you on how to expand my ZFS storage. I have a giant ZFS pool, which consists of 8x2TB (RAIDZ2) and 4×1.5TB (RAIDZ1), with a total of 15TB usable spaces. Since I am running out of space, I decide to upgrade the ZFS by replacing the disks one at a time. So here is the ZFS structure:

#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0


#df
storage/data     14T     13T    1.3T    91%    /storage/

What I decide to do is to replace the 1.5TB disks by 3TB disks (i.e., da[0-3]) one at a time. Basically, here are the steps you typically found on the web:

  1. Power down the server
  2. Replace the 1.5TB disk by 3TB disk one at a time
  3. Power on the server
  4. Replace the disk
  5. Resilver the entire pool.
  6. Hope the resilver process does not return any error.
  7. Repeat the above steps until all disks are replaced.
  8. Turn on the auto expand option and enjoy the extra space

Here are the corresponding commands. After replacing the first disk, you should see the following:

#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     DEGRADED     0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  DEGRADED     0     0     0
            da0     UNAVAI       0     0     0 cannot open
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0

However, the pool will continue to work because it is a RAIDZ pool. So I decide to swap the 1.5TB with the 3TB disk:

#sudo zpool replace mypool da0
#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     DEGRADED     0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  DEGRADED     0     0     0
      replacing-2   DEGRADED     0     0     0
        da0/old     FAULTED      0     0     0  corrupted data
        da0         ONLINE       0     0     0  (resilvering)
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0

Depending on how much data you have on the old disk and the hardware (such as motherboard, SATA configurations etc), resilvering a 1.5TB drive (with 1.3TB of data) took me about 15 hours. Personally, I recommend to start this process in the morning, then you can check the progress and start the next one in the evening. That way you can speed up the work.

So after the resilvering process is done. Make sure that you check the error status. If some files were missing, you need to delete those files first and restore them from your backup. Here is an example of the error:

#sudo zpool status -v

errors: Permanent errors have been detected in the following files: 

/storage/data/aaa
/storage/data/bbb
/storage/data/ccc

Remember, you need to delete the file first, then put the file back from your backup. If you simply replace/overwrite the file, it will not clear the error. Try to check the status again. If the error is not gone yet, you may want to scrub the zpool to trigger the resilver process.

sudo zpool scrub mypool

Or you can try to clear the error message. It will trigger the resilver process automatically.

sudo zpool clear -f mypool

I know what you are trying to say now. How come the ZFS will lose the data even I have RAIDZ and checksum enabled? I have no idea. That’s why we need back up on a different machine. Anyway, the resilver process will take another 15 hours.

After the error is cleared, repeat the steps to replace the disks one by one. Make sure that the error is cleared after every replacement. For me, replacing four hard drives took me exact 5 days, or 120 hours in total. Yes, it is not a fun job.

So after everything is completed, no error or anything bad. You try to check the pool status and you expect a magic will happen. Unfortunately, you will see the same amount of space available. Here are what you will need to do:

#I still have 1.3TB space left.
storage/data     14T     13T    1.3T    91%    /storage/
sudo zpool set autoexpand=on mypool 

Locate one of the disks you have replaced, in my case, they are da0, da1, da2 and da3

#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            da0     ONLINE       0     0     0 --This
            da1     ONLINE       0     0     0 --This
            da2     ONLINE       0     0     0 --This
            da3     ONLINE       0     0     0 --And this
sudo zpool online -e mypool da0

And check the space again…

#Now I have 5.3TB space available.
storage     18T     13T    5.3T    72%    /storage

FYI, here is the math behind the free space calculations. I had 4 x 1.5TB on a RAIDZ1 setup. After the upgrade, I have 4 x 3TB on a RAIDZ1 set up. The increase space will be (3TB – 1.5TB) x (4 – 1) * 0.9 = 4TB

Enjoy the new space!

If you have spent too many hours on the resilvering process, consider the old school way. My old school method is nothing new, but it is rock solid, reliable, takes shorter time and most importantly, no data will be lost. Yes, you are right, I back up the data to another server first, and then I rebuild the ZFS pool on my main server, and copy the data back. Typically, copying the 10TB of data via rsync daemon over a gigabit network will take about 3 days. So it isn’t too bad. The only down side of this solution is the down time, which is about 3 days in my case. If you decide to go with the ZFS replacement, the downtime will be minimized, because the ZFS pool will continue to work during the resilver process.

Further read: How to Improve Rsync Performance

Hope my solutions help!

–Derrick

Our sponsors:

Leave a Reply

Your email address will not be published. Required fields are marked *