ZFS

[FreeBSD+ZFS]One or more devices has experienced an error resulting in data corruption. Applications may be affected.

Posted on January 17, 2012 by Derrick

When you check the ZFS status, you may find the following error message: One or more devices has experienced an error resulting in data corruption. Applications may be affected.. There can be million reasons to cause this error message showing up. Of course, 99% of them are caused by hardware failure, such as bad hard drives, broken cables, defective motherboard, or even bad memory. In this article, I am assuming that you already eliminated these possibilities, and have scratched your head for hours, and still have no clue which part went wrong. In fact, that’s what I did today.

Long story short. Here is how I experienced this error:

FreeBSD: 8.2-> 9.0
ZFS: 4 -> 5
ZPool: 15 -> 28

My system was working fine (FreeBSD 8.2; ZFS ver. 4, Zpool ver. 15), everything seems perfect. After I upgraded my system to FreeBSD 9, I upgraded the ZFS and Zpool to ver.5 and ver. 28, respectively. Everything seemed working fine until I check the status:

sudo zpool status -v

#sudo zpool status -v
  pool: storage
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        storage/data:<0x0>

There are few things you need to pay attention:

The pool seems working fine, otherwise you will see FAULTED instead of ONLINE:

state: ONLINE

The system has no problem to read/write the data. Doesn’t seem to be hardware problem:

NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0

This error message may give you some clue what’s wrong. Notice that storage and data are the pool names.

errors: Permanent errors have been detected in the following files:

        storage/data:<0x0>

The <0x0> represents the meta data of the pool. I think the problem may come from the upgrade process. Here are the steps how to solve this problem.

Force Clearing the Error

You can try to clear the error by running:

sudo zpool clear -F mypool

If it can clear the error, you are done. However, it is likely that it won’t work, and you need to move to the next step.

Scrubbing the Pool

You can try to scrub the entire pool by running:

sudo zpool scrub mypool

This will make the system to inspect every single block and correct the error. Although this process is long (It took 5 hours to inspect my 10TB pool), there is a very high chance that the problem will be corrected. Don’t forget to clear error after scrubbing the pool.

Making each devices online again

If the error still exists after scrubbing the entire pool (and clearing the error), you can try to force making each device go online:

sudo zpool online mypool /dev/ada0 /dev/ada2 /dev/ad4 ...

Try to reboot the computer

This is the last thing you can try. This will force the computer to mount the pool again. Hopefully it will clear the error and error status.

Rebuild the pool

If none of the method works, the only solution left is to rebuild the pool.

#Backup your data first
#sudo zpool destroy mypool
#sudo zpool create mypool ...

Good luck.

Our sponsors:

ZFS+USB: Building a Super Large Server Using USB Memory, CF Card and SD Card.

Posted on December 17, 2011 by Derrick

I got a lot of unused USB thumb drives, CF flash card and SD cards sitting in my drawer. The sizes range from 8MB to 8GB. Unlike few years ago, it is getting a lot easier to access to the Internet, so I no longer need to carry my data via memory device any more. Instead, I simply connect to the Internet and the data is with me. That’s why my USB thumb drivers / CF flash cards / SD cards have been sitting in my drawer for few years.

I got an idea one day. It would be a waste to let them sitting in my drawer (or waiting to be sent to landfill). Why not I use them to build a file server. At least I can test out whether the idea is doable or not. So here are the candidates:

USB Thumb drives
1. 8GB x 2
2. 1GB x 1
3. 256MB x 2
CF Flash Card
1. 2GB x 1
2. 1GB x 1
3. 512MB x 2
SD Card
1. 4GB x 2

As you can see from my list, the size of each candidates varies from 256MB to 8GB. So it will be interesting to put them together and build a super large file server.

Most computer has multiple USB ports available. If you don’t have enough USB port, get a powered USB hub (i.e., the USB hub has it own power unit), it will be more efficient then getting the power from the computer. For the CF card, I use a SYBA SY-PCI48001 PCI to Compact Flash Adapter to connect my CF Flash cards to my computer. For the SD cards, I simply connect each of them to a Sandisk USB SD card reader.

Okay, let’s talk about the software. I am going to use ZFS to implement it, because it is quick and simple. First, connect all devices to your computer, and make sure that your operating systems can recognize all of them. In this tutorial, I am using FreeBSD as a tutorial. However, the idea should be the same in other ZFS ready system, such as Solaris.

Make sure that all USB devices are recognized by your operating system. In FreeBSD, the devices are registered as /dev/da* or /dev/ad*:

dmesg | egrep 'ad|da'

Now, you need to think about how to group your devices together. Do you simply want to build a pure USB ZFS pool, or a hybrid hard drive/USB pool. To keep thing simple, I will go with pure USB ZFS pool.

Suppose I am going to create a pure USB pool, which simply include every device in one single place:

zpool create myzpool /dev/ad0 /dev/ad1 /dev/ad2 /dev/da0 /dev/da1 /dev/da2

where the ad* and da* are the locations of my devices.

This will create a big pool. When you write some data to this pool, e.g.,

sudo dd if=/dev/random of=/myzpool/test_file count=10g bs=1M

The system will simply split the file into multiple chunks, and write all chunks to each USB devices at the same time.

Now let’s verify the pool information:

zpool iostat -v

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
myzpool
  ad12       969M   112K      0      0  1.15K  66.4K
  ad13      1.90G   112K      0      0  1.74K  66.4K
  ad14       480M    11K      0      0  3.08K  66.9K
  da0          1G  2.78G      0      0  5.55K  66.1K
  da1          1G  2.78G      0      0  5.55K  66.1K
  da2        240M    80K      0      0  2.41K  66.8K
  da3        240M   112K      0      0  1.87K  66.8K
  da4       7.50G    80K      0      0  4.35K  66.5K
  da5       7.50G    96K      0      0  2.98K  66.3K
  da6        972M   112K      0      0    278  39.9K
----------  -----  -----  -----  -----  -----  -----

As you can see, the system is split the data and write it to each devices. ZFS is very smart to adjust the number of split to optimize the performance.

Okay, what about the performance? Honestly you can’t expect too much from a pure-USB zpool, because the write speed is limited to 40MB/s, which is way too slow compared to the disk. The only advantage is that there is no moving parts, which significant decrease the failure rate, and the overall cost is cheap. Now, let’s make talk about the hybrid pool, a combination of USB and hard drive pool.

A hybrid ZFS pool is a combination of hard drives and USB drives. In my experiment, I put the USB devices as log and cache devices, while the hard drives are used as main storage. If you don’t know what is ZFS log or ZFS cache, you can think about a ZFS log devices is a buffer for writing the data, while a ZFS cache is for reading the data.

Ideally, you should use two identical devices (same size) for ZFS log (writing the data). For ZFS cache, it doesn’t matter.

First, let’s create our ZFS pool with the storage devices (i.e., hard drives) only.

zpool create myzpool raidz /dev/ad0 /dev/ad1 /dev/ad2

Next, we need to add the ZFS log. We are going to create a mirror, so that they have to be identical.

zpool add myzpool log mirror /dev/da0 /dev/da1

Finally, we add the ZFS cache.

zpool add myzpool cache /dev/da2 /dev/da3 /dev/da4

And let’s take a look to the whole picture:

zpool iostat -v

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
myzpool     5.83T  6.79T      9      2  1.03M   311K
  raidz1    5.83T  6.79T      9      1  1.03M   179K
    ad0         -      -      2      0   171K  29.9K
    ad1         -      -      2      0   171K  29.9K
    ad2         -      -      2      0   171K  29.9K
    ad3         -      -      2      0   171K  29.9K
    ad4         -      -      2      0   171K  29.9K
    ad5         -      -      2      0   171K  29.9K
    ad6         -      -      2      0   171K  29.9K
  da0        128K  3.78G      0      0      0  65.9K
  da1           -      -      0      0      0  65.9K
cache           -      -      -      -      -      -
  da2        961M     8M      0      0  1.14K  66.6K
  da3       1.89G     8M      0      0  1.73K  66.6K
  da4        472M     8M      0      0  3.06K  67.0K
  da5        232M     8M      0      0  2.40K  66.9K
  da6        232M     8M      0      0  1.86K  66.9K
  da7       7.50G     8M      0      0  4.33K  66.6K
  da8       7.50G     8M      0      0  2.96K  66.5K
  da9        964M     8M      0      0    276  39.6K
----------  -----  -----  -----  -----  -----  -----

With this combination, I get a pretty good performance (both read/write). When I copy the data from Windows to this ZFS pool using Samba, I can get a pretty high transfer speed (Over 100MB/s). Sometimes it get even close to 110MB/s. This result is very amazing given that my hard drives are the standard SATA drives (non-SSD) only.

The reliability of the USB devices / CF card / SD card sometimes questionable. That’s one of the reason why I don’t use them as the permanent storage media (using as Cache / log is okay). In this design, I use two SD cards (4GB x 2 = 8GB) as the ZFS log devices. Since they are set up as mirror, if one dies, the other one will kick in, which will minimize the data lost. For the cache devices, if one device is failed, I can remove it from the ZFS pool at any time. There will be no data lost so it will be okay.

I have run this super large server for few months already. There is about 200GB data I/O everyday, so far I am very happy with the overall performance. The most important thing is, those unused memory devices are now very happy as they don’t need to be sent to landfill.

Our sponsors:

Building a Super Large and Reliable File Server with Mixed Size of Harddisks

Posted on January 17, 2011 by Derrick

In this article, I am going to show you how to build a super large, ultra reliable, expandable file server with mixed size of hard drives.

Here is my criteria:

Large Storage Capacity. At this moment, the largest hard drive is 2TB. I am looking for at least 10TB.
Great Performance. The users should not feel any delay when multiple users are reading/writing to the file server at the same time.
Reliable. No data loss or inconsistent. The file system should repair by itself.
Expandable. The file system must be expandable. During the upgrade, the data should be preserved (i.e., don’t need to wipe out the original data)
Client Independent / Cross Platform. The user should be able to access the file server from any OS.
Software based / Hardware Independent. In case of hardware failure, it will be easy to transfer the entire system to a different computer.

I am going to show you how to implement all of these using FreeBSD and ZFS. The idea of the implementation is the same on other operating systems that support ZFS, such as *Solaris, *BSD etc. However, using ZFS on Linux (via Linux FUSE) is not recommended given the performance / stability issues (See Wikipedia for details). I personally do not recommend running ZFS on Linux in production environment.

Before I talk about how I do it, I like to go over what other technologies I tried before, and why they are not qualified.

RAID + LVM on Linux

This is one of the most common techniques used by many people on building file server. First, you need to build a RAID device using mdadm on the selected harddrives with identical size. If you have different size of harddrive, you will need to put them on a separated RAID device because mdadm does not support mixing different size of harddrive in a single RAID device without losing usable space. Later, you combine multiple devices (RAID, single harddrives etc) into a single partition using LVM.

The good thing about this technique is the expandability, i.e., you can add any hard drive to the partition at any time without losing any data at any time. However, there are few disadvantages of this combination:

Poor reliability
The reliability is handled at the RAID level. If your setup is not 100% RAID, such as the following example, the reliability can be discounted.

In this example, if the device 2 (2TB) is failed, the data that is stored on device 2 will be lost, because the data redundancy is only available on device 1.

Performance is not optimized
Data redundancy helps to improve the performance especially on reading. Again, if the setup is not one single RAID device, the performance can be discounted too.

So, let’s see how ZFS can solve these issues.

ZFS is the next generation file system developed by Sun. It comes with all advantages of mdadm + LVM, plus a lot of features such as compression, encryption, power failure handling, checksum etc. So why this setup is better than the LVM one? Let’s take a look to the following example:

In this example, all devices in the ZFS pool are protected against data failure.

Let’s Start Building A Super Large File Server Now!

First, let’s assume that I have the following hardware:

A 64-bit system with at least 2GB memory. Although ZFS will do fine on 32-bit system with 512MB memory, I recommend going with higher configurations because ZFS uses a lot of resources. In my case, I use an AMD Dual Core 4600+ (manufactured in 2006) with 3GB memory.
Mixed size of harddrives, the size of the larger harddrive has to be divisible by the smaller one. In my case, I have 2TB and 1TB harddrives. Keep in mind that 2x1TB = 2TB.
The harddrives are connected to the system using IDE/SATA. No firewire or USB.
Four 1TB harddrives (A, B, C, D)
Three 2TB harddrives (E, F, G)
One extra harddrive, any size, reserved for the system uses. (H)

Using this combination(10GB in total), you can build a big file server with 8GB or 6GB, depending on your preference of data security.

I am going to build a RAIDZ2 device using FreeBSD and ZFS. In general ZFS is only available on the harddrives with the same size (without losing harddrive space). Since I want to put my 1TB and 2TB harddrives in the same pool, I first create couple RAID0 drives first. Then I add them together to make a big ZFS device. Here is the big picture:

Building RAID0 Devices

As usual, login as root first.

sudo su

And load the stripe module:

kldload geom_stripe

Now we need to create a RAID0 device from /dev/ad1(A: 1TB) and /dev/ad2(B:1TB). If you are unsure about the device name, try running dmesg for details:

dmesg | grep ad

gstripe label -v st0 /dev/ad1 /dev/ad2

And label the new device: /dev/stripe/st0

bsdlabel -wB /dev/stripe/st0

Format the new device:

newfs -U /dev/stripe/st0a

Mount the device for testing:

mount /dev/stripe/st0a /mnt

Verify the size:

df -h

Add the following into /boot/loader.conf:

geom_stripe_load="YES"

Now, reboot your machine. If /dev/stripe/st0 is available, then your RAID0 device is ready.

If you need to build more RAID0 devices, repeat the above steps. Keep in mind that you need to change the device name from st0 to st1.

Putting all devices together into ZFS Pool

First, let’s enable ZFS first:

echo 'zfs_enable="YES"' >> /etc/rc.conf

and start ZFS:

/etc/rc.d/zfs start

Get your devices ready. In my cases, my devices name are:
/dev/ad5: 2TB
/dev/ad6: 2TB
/dev/ad7: 2TB
/dev/stripe/st0: 2TB (RAID0: 2x1TB)
/dev/stripe/st1: 2TB (RAID0: 2x1TB)

Create the ZFS pool, which will mount on /storage/data
Note that I use raidz2 here for extra protection against data failure. Basically, raidz (single parity) allows up to 1 failed harddrives while raidz2 (double parity) allows up to 2 failed harddrives.

RAIDZ: Usable space: 8GB, allow up to one 2TB harddrive group failed (i.e., one 2TB or two 1TB in the same group)
RAIDZ2: Usable space: 6GB, allow up to two 2TB harddrive groups failed (i.e., two 2TB or two 1TB in the different groups)

zpool create storage raidz2 /dev/ad5 /dev/ad6 /dev/ad7 /dev/stripe/st0 /dev/stripe/st1

zfs create storage/data

Verify the result:

zpool status

df -h

We need to make some performance tweaking first, otherwise the system will be very unstable. Add the following to the /boot/loader.conf

#Your Physical Memory minus 1 GB
vm.kmem_size="3g"
vm.kmem_size_max="3g"

vfs.zfs.arc_min="512m"

#Your Physical Memory minus 2GB
vfs.zfs.arc_max="2048m"


vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"

Now reboot the system:

reboot

To test the stability of the system, I recommend saving some files into the ZFS pool and run it for few days.

Why Running ZFS on Linux is not a Solution?

ZFS is not supported by Linux natively due to legal issue. In the other words, it is not supported by Linux kernel. Although ZFS has been ported to Linux via FUSE, which is running ZFS as an application, it will introduce the performance and efficiency penalties. (Source: ZFS Wiki). So I don’t recommend running ZFS on Linux.

Enhancement Idea

You can use Samba to share the files with Windows users.
To secure your data, I recommend set up another system on a separate machine and mirror the data using rsync.

Tag: ZFS