ZFS+USB: Building a Super Large Server Using USB Memory, CF Card and SD Card.

I got a lot of unused USB thumb drives, CF flash card and SD cards sitting in my drawer. The sizes range from 8MB to 8GB. Unlike few years ago, it is getting a lot easier to access to the Internet, so I no longer need to carry my data via memory device any more. Instead, I simply connect to the Internet and the data is with me. That’s why my USB thumb drivers / CF flash cards / SD cards have been sitting in my drawer for few years.

I got an idea one day. It would be a waste to let them sitting in my drawer (or waiting to be sent to landfill). Why not I use them to build a file server. At least I can test out whether the idea is doable or not. So here are the candidates:

  • USB Thumb drives

    1. 8GB x 2
    2. 1GB x 1
    3. 256MB x 2
  • CF Flash Card

    1. 2GB x 1
    2. 1GB x 1
    3. 512MB x 2
  • SD Card

    1. 4GB x 2

As you can see from my list, the size of each candidates varies from 256MB to 8GB. So it will be interesting to put them together and build a super large file server.

Most computer has multiple USB ports available. If you don’t have enough USB port, get a powered USB hub (i.e., the USB hub has it own power unit), it will be more efficient then getting the power from the computer. For the CF card, I use a SYBA SY-PCI48001 PCI to Compact Flash Adapter to connect my CF Flash cards to my computer. For the SD cards, I simply connect each of them to a Sandisk USB SD card reader.

Okay, let’s talk about the software. I am going to use ZFS to implement it, because it is quick and simple. First, connect all devices to your computer, and make sure that your operating systems can recognize all of them. In this tutorial, I am using FreeBSD as a tutorial. However, the idea should be the same in other ZFS ready system, such as Solaris.

Make sure that all USB devices are recognized by your operating system. In FreeBSD, the devices are registered as /dev/da* or /dev/ad*:

dmesg | egrep 'ad|da'

Now, you need to think about how to group your devices together. Do you simply want to build a pure USB ZFS pool, or a hybrid hard drive/USB pool. To keep thing simple, I will go with pure USB ZFS pool.

Suppose I am going to create a pure USB pool, which simply include every device in one single place:

zpool create myzpool /dev/ad0 /dev/ad1 /dev/ad2 /dev/da0 /dev/da1 /dev/da2

where the ad* and da* are the locations of my devices.

This will create a big pool. When you write some data to this pool, e.g.,

sudo dd if=/dev/random of=/myzpool/test_file count=10g bs=1M

The system will simply split the file into multiple chunks, and write all chunks to each USB devices at the same time.

Now let’s verify the pool information:

zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
myzpool
  ad12       969M   112K      0      0  1.15K  66.4K
  ad13      1.90G   112K      0      0  1.74K  66.4K
  ad14       480M    11K      0      0  3.08K  66.9K
  da0          1G  2.78G      0      0  5.55K  66.1K
  da1          1G  2.78G      0      0  5.55K  66.1K
  da2        240M    80K      0      0  2.41K  66.8K
  da3        240M   112K      0      0  1.87K  66.8K
  da4       7.50G    80K      0      0  4.35K  66.5K
  da5       7.50G    96K      0      0  2.98K  66.3K
  da6        972M   112K      0      0    278  39.9K
----------  -----  -----  -----  -----  -----  -----

As you can see, the system is split the data and write it to each devices. ZFS is very smart to adjust the number of split to optimize the performance.

Okay, what about the performance? Honestly you can’t expect too much from a pure-USB zpool, because the write speed is limited to 40MB/s, which is way too slow compared to the disk. The only advantage is that there is no moving parts, which significant decrease the failure rate, and the overall cost is cheap. Now, let’s make talk about the hybrid pool, a combination of USB and hard drive pool.

A hybrid ZFS pool is a combination of hard drives and USB drives. In my experiment, I put the USB devices as log and cache devices, while the hard drives are used as main storage. If you don’t know what is ZFS log or ZFS cache, you can think about a ZFS log devices is a buffer for writing the data, while a ZFS cache is for reading the data.

Ideally, you should use two identical devices (same size) for ZFS log (writing the data). For ZFS cache, it doesn’t matter.

First, let’s create our ZFS pool with the storage devices (i.e., hard drives) only.

zpool create myzpool raidz /dev/ad0 /dev/ad1 /dev/ad2

Next, we need to add the ZFS log. We are going to create a mirror, so that they have to be identical.

zpool add myzpool log mirror /dev/da0 /dev/da1

Finally, we add the ZFS cache.

zpool add myzpool cache /dev/da2 /dev/da3 /dev/da4

And let’s take a look to the whole picture:

zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
myzpool     5.83T  6.79T      9      2  1.03M   311K
  raidz1    5.83T  6.79T      9      1  1.03M   179K
    ad0         -      -      2      0   171K  29.9K
    ad1         -      -      2      0   171K  29.9K
    ad2         -      -      2      0   171K  29.9K
    ad3         -      -      2      0   171K  29.9K
    ad4         -      -      2      0   171K  29.9K
    ad5         -      -      2      0   171K  29.9K
    ad6         -      -      2      0   171K  29.9K
  da0        128K  3.78G      0      0      0  65.9K
  da1           -      -      0      0      0  65.9K
cache           -      -      -      -      -      -
  da2        961M     8M      0      0  1.14K  66.6K
  da3       1.89G     8M      0      0  1.73K  66.6K
  da4        472M     8M      0      0  3.06K  67.0K
  da5        232M     8M      0      0  2.40K  66.9K
  da6        232M     8M      0      0  1.86K  66.9K
  da7       7.50G     8M      0      0  4.33K  66.6K
  da8       7.50G     8M      0      0  2.96K  66.5K
  da9        964M     8M      0      0    276  39.6K
----------  -----  -----  -----  -----  -----  -----

With this combination, I get a pretty good performance (both read/write). When I copy the data from Windows to this ZFS pool using Samba, I can get a pretty high transfer speed (Over 100MB/s). Sometimes it get even close to 110MB/s. This result is very amazing given that my hard drives are the standard SATA drives (non-SSD) only.

The reliability of the USB devices / CF card / SD card sometimes questionable. That’s one of the reason why I don’t use them as the permanent storage media (using as Cache / log is okay). In this design, I use two SD cards (4GB x 2 = 8GB) as the ZFS log devices. Since they are set up as mirror, if one dies, the other one will kick in, which will minimize the data lost. For the cache devices, if one device is failed, I can remove it from the ZFS pool at any time. There will be no data lost so it will be okay.

I have run this super large server for few months already. There is about 200GB data I/O everyday, so far I am very happy with the overall performance. The most important thing is, those unused memory devices are now very happy as they don’t need to be sent to landfill.

Our sponsors: