I got a lot of unused USB thumb drives, CF flash card and SD cards sitting in my drawer. The sizes range from 8MB to 8GB. Unlike few years ago, it is getting a lot easier to access to the Internet, so I no longer need to carry my data via memory device any more. Instead, I simply connect to the Internet and the data is with me. That’s why my USB thumb drivers / CF flash cards / SD cards have been sitting in my drawer for few years.
I got an idea one day. It would be a waste to let them sitting in my drawer (or waiting to be sent to landfill). Why not I use them to build a file server. At least I can test out whether the idea is doable or not. So here are the candidates:
USB Thumb drives
- 8GB x 2
- 1GB x 1
- 256MB x 2
CF Flash Card
- 2GB x 1
- 1GB x 1
- 512MB x 2
- 4GB x 2
As you can see from my list, the size of each candidates varies from 256MB to 8GB. So it will be interesting to put them together and build a super large file server.
Most computer has multiple USB ports available. If you don’t have enough USB port, get a powered USB hub (i.e., the USB hub has it own power unit), it will be more efficient then getting the power from the computer. For the CF card, I use a SYBA SY-PCI48001 PCI to Compact Flash Adapter to connect my CF Flash cards to my computer. For the SD cards, I simply connect each of them to a Sandisk USB SD card reader.
Okay, let’s talk about the software. I am going to use ZFS to implement it, because it is quick and simple. First, connect all devices to your computer, and make sure that your operating systems can recognize all of them. In this tutorial, I am using FreeBSD as a tutorial. However, the idea should be the same in other ZFS ready system, such as Solaris.
Make sure that all USB devices are recognized by your operating system. In FreeBSD, the devices are registered as /dev/da* or /dev/ad*:
dmesg | egrep 'ad|da'
Now, you need to think about how to group your devices together. Do you simply want to build a pure USB ZFS pool, or a hybrid hard drive/USB pool. To keep thing simple, I will go with pure USB ZFS pool.
Suppose I am going to create a pure USB pool, which simply include every device in one single place:
zpool create myzpool /dev/ad0 /dev/ad1 /dev/ad2 /dev/da0 /dev/da1 /dev/da2
where the ad* and da* are the locations of my devices.
This will create a big pool. When you write some data to this pool, e.g.,
sudo dd if=/dev/random of=/myzpool/test_file count=10g bs=1M
The system will simply split the file into multiple chunks, and write all chunks to each USB devices at the same time.
Now let’s verify the pool information:
zpool iostat -v
capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- myzpool ad12 969M 112K 0 0 1.15K 66.4K ad13 1.90G 112K 0 0 1.74K 66.4K ad14 480M 11K 0 0 3.08K 66.9K da0 1G 2.78G 0 0 5.55K 66.1K da1 1G 2.78G 0 0 5.55K 66.1K da2 240M 80K 0 0 2.41K 66.8K da3 240M 112K 0 0 1.87K 66.8K da4 7.50G 80K 0 0 4.35K 66.5K da5 7.50G 96K 0 0 2.98K 66.3K da6 972M 112K 0 0 278 39.9K ---------- ----- ----- ----- ----- ----- -----
As you can see, the system is split the data and write it to each devices. ZFS is very smart to adjust the number of split to optimize the performance.
Okay, what about the performance? Honestly you can’t expect too much from a pure-USB zpool, because the write speed is limited to 40MB/s, which is way too slow compared to the disk. The only advantage is that there is no moving parts, which significant decrease the failure rate, and the overall cost is cheap. Now, let’s make talk about the hybrid pool, a combination of USB and hard drive pool.
A hybrid ZFS pool is a combination of hard drives and USB drives. In my experiment, I put the USB devices as log and cache devices, while the hard drives are used as main storage. If you don’t know what is ZFS log or ZFS cache, you can think about a ZFS log devices is a buffer for writing the data, while a ZFS cache is for reading the data.
Ideally, you should use two identical devices (same size) for ZFS log (writing the data). For ZFS cache, it doesn’t matter.
First, let’s create our ZFS pool with the storage devices (i.e., hard drives) only.
zpool create myzpool raidz /dev/ad0 /dev/ad1 /dev/ad2
Next, we need to add the ZFS log. We are going to create a mirror, so that they have to be identical.
zpool add myzpool log mirror /dev/da0 /dev/da1
Finally, we add the ZFS cache.
zpool add myzpool cache /dev/da2 /dev/da3 /dev/da4
And let’s take a look to the whole picture:
zpool iostat -v
capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- myzpool 5.83T 6.79T 9 2 1.03M 311K raidz1 5.83T 6.79T 9 1 1.03M 179K ad0 - - 2 0 171K 29.9K ad1 - - 2 0 171K 29.9K ad2 - - 2 0 171K 29.9K ad3 - - 2 0 171K 29.9K ad4 - - 2 0 171K 29.9K ad5 - - 2 0 171K 29.9K ad6 - - 2 0 171K 29.9K da0 128K 3.78G 0 0 0 65.9K da1 - - 0 0 0 65.9K cache - - - - - - da2 961M 8M 0 0 1.14K 66.6K da3 1.89G 8M 0 0 1.73K 66.6K da4 472M 8M 0 0 3.06K 67.0K da5 232M 8M 0 0 2.40K 66.9K da6 232M 8M 0 0 1.86K 66.9K da7 7.50G 8M 0 0 4.33K 66.6K da8 7.50G 8M 0 0 2.96K 66.5K da9 964M 8M 0 0 276 39.6K ---------- ----- ----- ----- ----- ----- -----
With this combination, I get a pretty good performance (both read/write). When I copy the data from Windows to this ZFS pool using Samba, I can get a pretty high transfer speed (Over 100MB/s). Sometimes it get even close to 110MB/s. This result is very amazing given that my hard drives are the standard SATA drives (non-SSD) only.
The reliability of the USB devices / CF card / SD card sometimes questionable. That’s one of the reason why I don’t use them as the permanent storage media (using as Cache / log is okay). In this design, I use two SD cards (4GB x 2 = 8GB) as the ZFS log devices. Since they are set up as mirror, if one dies, the other one will kick in, which will minimize the data lost. For the cache devices, if one device is failed, I can remove it from the ZFS pool at any time. There will be no data lost so it will be okay.
I have run this super large server for few months already. There is about 200GB data I/O everyday, so far I am very happy with the overall performance. The most important thing is, those unused memory devices are now very happy as they don’t need to be sent to landfill.