Today I am going to share my story with you on how to expand my ZFS storage. I have a giant ZFS pool, which consists of 8x2TB (RAIDZ2) and 4×1.5TB (RAIDZ1), with a total of 15TB usable spaces. Since I am running out of space, I decide to upgrade the ZFS by replacing the disks one at a time. So here is the ZFS structure:
#zpool status NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada8 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 #df storage/data 14T 13T 1.3T 91% /storage/
What I decide to do is to replace the 1.5TB disks by 3TB disks (i.e., da[0-3]) one at a time. Basically, here are the steps you typically found on the web:
- Power down the server
- Replace the 1.5TB disk by 3TB disk one at a time
- Power on the server
- Replace the disk
- Resilver the entire pool.
- Hope the resilver process does not return any error.
- Repeat the above steps until all disks are replaced.
- Turn on the auto expand option and enjoy the extra space
Here are the corresponding commands. After replacing the first disk, you should see the following:
#zpool status NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada8 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 da0 UNAVAI 0 0 0 cannot open da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0
However, the pool will continue to work because it is a RAIDZ pool. So I decide to swap the 1.5TB with the 3TB disk:
#sudo zpool replace mypool da0
#zpool status NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada8 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 replacing-2 DEGRADED 0 0 0 da0/old FAULTED 0 0 0 corrupted data da0 ONLINE 0 0 0 (resilvering) da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0
Depending on how much data you have on the old disk and the hardware (such as motherboard, SATA configurations etc), resilvering a 1.5TB drive (with 1.3TB of data) took me about 15 hours. Personally, I recommend to start this process in the morning, then you can check the progress and start the next one in the evening. That way you can speed up the work.
So after the resilvering process is done. Make sure that you check the error status. If some files were missing, you need to delete those files first and restore them from your backup. Here is an example of the error:
#sudo zpool status -v errors: Permanent errors have been detected in the following files: /storage/data/aaa /storage/data/bbb /storage/data/ccc
Remember, you need to delete the file first, then put the file back from your backup. If you simply replace/overwrite the file, it will not clear the error. Try to check the status again. If the error is not gone yet, you may want to scrub the zpool to trigger the resilver process.
sudo zpool scrub mypool
Or you can try to clear the error message. It will trigger the resilver process automatically.
sudo zpool clear -f mypool
I know what you are trying to say now. How come the ZFS will lose the data even I have RAIDZ and checksum enabled? I have no idea. That’s why we need back up on a different machine. Anyway, the resilver process will take another 15 hours.
After the error is cleared, repeat the steps to replace the disks one by one. Make sure that the error is cleared after every replacement. For me, replacing four hard drives took me exact 5 days, or 120 hours in total. Yes, it is not a fun job.
So after everything is completed, no error or anything bad. You try to check the pool status and you expect a magic will happen. Unfortunately, you will see the same amount of space available. Here are what you will need to do:
#I still have 1.3TB space left. storage/data 14T 13T 1.3T 91% /storage/
sudo zpool set autoexpand=on mypool
Locate one of the disks you have replaced, in my case, they are da0, da1, da2 and da3
#zpool status NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada8 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da0 ONLINE 0 0 0 --This da1 ONLINE 0 0 0 --This da2 ONLINE 0 0 0 --This da3 ONLINE 0 0 0 --And this
sudo zpool online -e mypool da0
And check the space again…
#Now I have 5.3TB space available. storage 18T 13T 5.3T 72% /storage
FYI, here is the math behind the free space calculations. I had 4 x 1.5TB on a RAIDZ1 setup. After the upgrade, I have 4 x 3TB on a RAIDZ1 set up. The increase space will be (3TB – 1.5TB) x (4 – 1) * 0.9 = 4TB
Enjoy the new space!
If you have spent too many hours on the resilvering process, consider the old school way. My old school method is nothing new, but it is rock solid, reliable, takes shorter time and most importantly, no data will be lost. Yes, you are right, I back up the data to another server first, and then I rebuild the ZFS pool on my main server, and copy the data back. Typically, copying the 10TB of data via rsync daemon over a gigabit network will take about 3 days. So it isn’t too bad. The only down side of this solution is the down time, which is about 3 days in my case. If you decide to go with the ZFS replacement, the downtime will be minimized, because the ZFS pool will continue to work during the resilver process.
Further read: How to Improve Rsync Performance
Hope my solutions help!