When you check the ZFS status, you may find the following error message: One or more devices has experienced an error resulting in data corruption. Applications may be affected.. There can be million reasons to cause this error message showing up. Of course, 99% of them are caused by hardware failure, such as bad hard drives, broken cables, defective motherboard, or even bad memory. In this article, I am assuming that you already eliminated these possibilities, and have scratched your head for hours, and still have no clue which part went wrong. In fact, that’s what I did today.
Long story short. Here is how I experienced this error:
FreeBSD: 8.2-> 9.0 ZFS: 4 -> 5 ZPool: 15 -> 28
My system was working fine (FreeBSD 8.2; ZFS ver. 4, Zpool ver. 15), everything seems perfect. After I upgraded my system to FreeBSD 9, I upgraded the ZFS and Zpool to ver.5 and ver. 28, respectively. Everything seemed working fine until I check the status:
sudo zpool status -v
#sudo zpool status -v pool: storage state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: storage/data:<0x0>
There are few things you need to pay attention:
The pool seems working fine, otherwise you will see FAULTED instead of ONLINE:
The system has no problem to read/write the data. Doesn’t seem to be hardware problem:
NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0
This error message may give you some clue what’s wrong. Notice that storage and data are the pool names.
errors: Permanent errors have been detected in the following files: storage/data:<0x0>
The <0x0> represents the meta data of the pool. I think the problem may come from the upgrade process. Here are the steps how to solve this problem.
Force Clearing the Error
You can try to clear the error by running:
sudo zpool clear -F mypool
If it can clear the error, you are done. However, it is likely that it won’t work, and you need to move to the next step.
Scrubbing the Pool
You can try to scrub the entire pool by running:
sudo zpool scrub mypool
This will make the system to inspect every single block and correct the error. Although this process is long (It took 5 hours to inspect my 10TB pool), there is a very high chance that the problem will be corrected. Don’t forget to clear error after scrubbing the pool.
Making each devices online again
If the error still exists after scrubbing the entire pool (and clearing the error), you can try to force making each device go online:
sudo zpool online mypool /dev/ada0 /dev/ada2 /dev/ad4 ...
Try to reboot the computer
This is the last thing you can try. This will force the computer to mount the pool again. Hopefully it will clear the error and error status.
Rebuild the pool
If none of the method works, the only solution left is to rebuild the pool.
#Backup your data first #sudo zpool destroy mypool #sudo zpool create mypool ...