[FreeBSD+ZFS]One or more devices has experienced an error resulting in data corruption. Applications may be affected.

When you check the ZFS status, you may find the following error message: One or more devices has experienced an error resulting in data corruption. Applications may be affected.. There can be million reasons to cause this error message showing up. Of course, 99% of them are caused by hardware failure, such as bad hard drives, broken cables, defective motherboard, or even bad memory. In this article, I am assuming that you already eliminated these possibilities, and have scratched your head for hours, and still have no clue which part went wrong. In fact, that’s what I did today.

Long story short. Here is how I experienced this error:

FreeBSD: 8.2-> 9.0
ZFS: 4 -> 5
ZPool: 15 -> 28

My system was working fine (FreeBSD 8.2; ZFS ver. 4, Zpool ver. 15), everything seems perfect. After I upgraded my system to FreeBSD 9, I upgraded the ZFS and Zpool to ver.5 and ver. 28, respectively. Everything seemed working fine until I check the status:

sudo zpool status -v

#sudo zpool status -v
  pool: storage
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        storage/data:<0x0>

There are few things you need to pay attention:

The pool seems working fine, otherwise you will see FAULTED instead of ONLINE:

state: ONLINE

The system has no problem to read/write the data. Doesn’t seem to be hardware problem:

NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0

This error message may give you some clue what’s wrong. Notice that storage and data are the pool names.

errors: Permanent errors have been detected in the following files:

        storage/data:<0x0>

The <0x0> represents the meta data of the pool. I think the problem may come from the upgrade process. Here are the steps how to solve this problem.

Force Clearing the Error

You can try to clear the error by running:

sudo zpool clear -F mypool

If it can clear the error, you are done. However, it is likely that it won’t work, and you need to move to the next step.

Scrubbing the Pool

You can try to scrub the entire pool by running:

sudo zpool scrub mypool

This will make the system to inspect every single block and correct the error. Although this process is long (It took 5 hours to inspect my 10TB pool), there is a very high chance that the problem will be corrected. Don’t forget to clear error after scrubbing the pool.

Making each devices online again

If the error still exists after scrubbing the entire pool (and clearing the error), you can try to force making each device go online:

sudo zpool online mypool /dev/ada0 /dev/ada2 /dev/ad4 ...

Try to reboot the computer

This is the last thing you can try. This will force the computer to mount the pool again. Hopefully it will clear the error and error status.

Rebuild the pool

If none of the method works, the only solution left is to rebuild the pool.

#Backup your data first
#sudo zpool destroy mypool
#sudo zpool create mypool ...

Good luck.

Our sponsors: