Building a Super Large and Reliable File Server with Mixed Size of Harddisks

In this article, I am going to show you how to build a super large, ultra reliable, expandable file server with mixed size of hard drives.

Here is my criteria:

  • Large Storage Capacity. At this moment, the largest hard drive is 2TB. I am looking for at least 10TB.
  • Great Performance. The users should not feel any delay when multiple users are reading/writing to the file server at the same time.
  • Reliable. No data loss or inconsistent. The file system should repair by itself.
  • Expandable. The file system must be expandable. During the upgrade, the data should be preserved (i.e., don’t need to wipe out the original data)
  • Client Independent / Cross Platform. The user should be able to access the file server from any OS.
  • Software based / Hardware Independent. In case of hardware failure, it will be easy to transfer the entire system to a different computer.

I am going to show you how to implement all of these using FreeBSD and ZFS. The idea of the implementation is the same on other operating systems that support ZFS, such as *Solaris, *BSD etc. However, using ZFS on Linux (via Linux FUSE) is not recommended given the performance / stability issues (See Wikipedia for details). I personally do not recommend running ZFS on Linux in production environment.

Before I talk about how I do it, I like to go over what other technologies I tried before, and why they are not qualified.

RAID + LVM on Linux

This is one of the most common techniques used by many people on building file server. First, you need to build a RAID device using mdadm on the selected harddrives with identical size. If you have different size of harddrive, you will need to put them on a separated RAID device because mdadm does not support mixing different size of harddrive in a single RAID device without losing usable space. Later, you combine multiple devices (RAID, single harddrives etc) into a single partition using LVM.

The good thing about this technique is the expandability, i.e., you can add any hard drive to the partition at any time without losing any data at any time. However, there are few disadvantages of this combination:

Poor reliability
The reliability is handled at the RAID level. If your setup is not 100% RAID, such as the following example, the reliability can be discounted.

In this example, if the device 2 (2TB) is failed, the data that is stored on device 2 will be lost, because the data redundancy is only available on device 1.

Performance is not optimized
Data redundancy helps to improve the performance especially on reading. Again, if the setup is not one single RAID device, the performance can be discounted too.

So, let’s see how ZFS can solve these issues.

ZFS

ZFS is the next generation file system developed by Sun. It comes with all advantages of mdadm + LVM, plus a lot of features such as compression, encryption, power failure handling, checksum etc. So why this setup is better than the LVM one? Let’s take a look to the following example:

In this example, all devices in the ZFS pool are protected against data failure.

Let’s Start Building A Super Large File Server Now!

First, let’s assume that I have the following hardware:

  1. A 64-bit system with at least 2GB memory. Although ZFS will do fine on 32-bit system with 512MB memory, I recommend going with higher configurations because ZFS uses a lot of resources. In my case, I use an AMD Dual Core 4600+ (manufactured in 2006) with 3GB memory.
  2. Mixed size of harddrives, the size of the larger harddrive has to be divisible by the smaller one. In my case, I have 2TB and 1TB harddrives. Keep in mind that 2x1TB = 2TB.
  3. The harddrives are connected to the system using IDE/SATA. No firewire or USB.
  4. Four 1TB harddrives (A, B, C, D)
  5. Three 2TB harddrives (E, F, G)
  6. One extra harddrive, any size, reserved for the system uses. (H)

Using this combination(10GB in total), you can build a big file server with 8GB or 6GB, depending on your preference of data security.

I am going to build a RAIDZ2 device using FreeBSD and ZFS. In general ZFS is only available on the harddrives with the same size (without losing harddrive space). Since I want to put my 1TB and 2TB harddrives in the same pool, I first create couple RAID0 drives first. Then I add them together to make a big ZFS device. Here is the big picture:

Building RAID0 Devices

As usual, login as root first.

sudo su

And load the stripe module:

kldload geom_stripe

Now we need to create a RAID0 device from /dev/ad1(A: 1TB) and /dev/ad2(B:1TB). If you are unsure about the device name, try running dmesg for details:

dmesg | grep ad
gstripe label -v st0 /dev/ad1 /dev/ad2

And label the new device: /dev/stripe/st0

bsdlabel -wB /dev/stripe/st0

Format the new device:

newfs -U /dev/stripe/st0a

Mount the device for testing:

mount /dev/stripe/st0a /mnt

Verify the size:

df -h

Add the following into /boot/loader.conf:

geom_stripe_load="YES"

Now, reboot your machine. If /dev/stripe/st0 is available, then your RAID0 device is ready.

If you need to build more RAID0 devices, repeat the above steps. Keep in mind that you need to change the device name from st0 to st1.

Putting all devices together into ZFS Pool

First, let’s enable ZFS first:

echo 'zfs_enable="YES"' >> /etc/rc.conf

and start ZFS:

/etc/rc.d/zfs start

Get your devices ready. In my cases, my devices name are:
/dev/ad5: 2TB
/dev/ad6: 2TB
/dev/ad7: 2TB
/dev/stripe/st0: 2TB (RAID0: 2x1TB)
/dev/stripe/st1: 2TB (RAID0: 2x1TB)

Create the ZFS pool, which will mount on /storage/data
Note that I use raidz2 here for extra protection against data failure. Basically, raidz (single parity) allows up to 1 failed harddrives while raidz2 (double parity) allows up to 2 failed harddrives.

  • RAIDZ: Usable space: 8GB, allow up to one 2TB harddrive group failed (i.e., one 2TB or two 1TB in the same group)
  • RAIDZ2: Usable space: 6GB, allow up to two 2TB harddrive groups failed (i.e., two 2TB or two 1TB in the different groups)
zpool create storage raidz2 /dev/ad5 /dev/ad6 /dev/ad7 /dev/stripe/st0 /dev/stripe/st1
zfs create storage/data

Verify the result:

zpool status
df -h

We need to make some performance tweaking first, otherwise the system will be very unstable. Add the following to the /boot/loader.conf

#Your Physical Memory minus 1 GB
vm.kmem_size="3g"
vm.kmem_size_max="3g"

vfs.zfs.arc_min="512m"

#Your Physical Memory minus 2GB
vfs.zfs.arc_max="2048m"


vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"

Now reboot the system:

reboot

To test the stability of the system, I recommend saving some files into the ZFS pool and run it for few days.

Why Running ZFS on Linux is not a Solution?

ZFS is not supported by Linux natively due to legal issue. In the other words, it is not supported by Linux kernel. Although ZFS has been ported to Linux via FUSE, which is running ZFS as an application, it will introduce the performance and efficiency penalties. (Source: ZFS Wiki). So I don’t recommend running ZFS on Linux.

Enhancement Idea

  1. You can use Samba to share the files with Windows users.
  2. To secure your data, I recommend set up another system on a separate machine and mirror the data using rsync.

Further Readings

  1. FreeBSD Handbook: RAID0 – Striping
  2. FreeBSD Handbook: The Z File System (ZFS)
  3. ZFS Tuning Guide

Our sponsors:

5 comments

  1. Hello,
    i am looking for a solution with big filesize, security and performance. actually i have a squeeze with 12×1,5tb ext on software raid 5. while ext4 eq. the tools doesnt support a size greater than 16tb, i am looking for another solution. my idea is a cluster with a pool of 2, 3, .. comuter wich all have 16tb of raid(6) and all together offer one large filesystem for samba share. Do you have any idea of an solution? thx

    1. What about Hadoop (A distributed file system across multiple computers, similar to Google File System)? I am currently experimenting Hadoop with ZFS + RAIDZ, i.e., each cluster (computer) run the ZFS + RAIDZ first, and integrate multiple clusters together to form a big file system.

      Afterward, all you need is to access the main node (which will take care of reading and writing files in different computers). All you need is to enable Samba on that unit.

  2. Hi I was wondering if you can recommend me a server which can do the following:

    Reliable file servers with the ability to hold at least 1 TB of disk storage without worrying about disk failure

    Thank you

    1. Depending on how many people need to access your file server (# of concurrent access). If the number is small (less than 10), you can turn any medium/high-end grade desktop PC into a server. If it is for production purpose, you may want to use server grade computer. I personally recommend Dell T110. It is a entry/medium level server, very reliable, and it cost about US$400-USD$500.

      Hope it helps.

      –Derrick

Leave a Reply to Valentin Cancel reply

Your email address will not be published. Required fields are marked *