Dropbox on FreeBSD

I put my personal websites on a FreeBSD server. One of my websites is a photo album, which I want to read the content from a Dropbox. That Dropbox primarily runs on Mac, iPhone and iPad. I was trying to explore the possibilities to set up a Dropbox on FreeBSD. Since Dropbox doesn’t support FreeBSD officially, I need to use 3rd party tools, most of them are basically based on the Dropbox developer API.

So I have tried several 3rd party tools, as you expect, none of them works. The primary problem is the synchronization, i.e., if my wife adds or deletes a photo on the Dropbox, I expect that the Dropbox folder on FreeBSD will get updated as well. Another problem is the speed. Looks like the Dropbox API is not as fast comparing to its own native application. On the same network, it took few hours to download the content (around 1GB of jpeg files) from Dropbox on FreeBSD, versus 10 minutes on a Mac/Windows/Linux machine using the native application.

So I came up few alternative solutions:

  1. Hosting my website on CentOS Linux. Since Dropbox supports Linux, I can easily read the Dropbox without any problem.
  2. Push the Dropbox content from Mac/Linux to FreeBSD using Rsync periodically (e.g., every 5 mins, hourly etc). That way FreeBSD will have access the Dropbox files.
  3. Set up a NFS service on a Linux box with access to Dropbox, and let the FreeBSD to mount the corresponding NFS share. This solution is okay if both machines are on the same network. It may raise some security concerns if both machines are connected via the public.

Another solution I think it may work is to install the Dropbox native application on FreeBSD. FreeBSD supports running Linux application via Linux emulation. Back in the old days (FreeBSD 8), it was pretty easy to include the Linux support on FreeBSD (one click in the sysinstall). Since the recent releases, they’ve made it harder because not many people wants to run Linux binary on FreeBSD. Based on my previous experience, I think it should work on the latest FreeBSD, but it may require some works.

Another crazy idea will be running Dropbox with Wine on FreeBSD. But this goes way too far from my original purpose, and I am not a big fan of Wine because it adds too many libraries to the system.

Our sponsors:

RHEL 7 / MariaDB / MySQL: ERROR 1018 (HY000): Can’t read dir of ‘.’ (errno: 24)

Recently, I decided to upgrade a database server from RHEL 6 (CentOS 6) to RHEL 7 (CentOS 7), which involves switching from MySQL 5.5 to MariaDB 5.5. Our server hosts about 100 databases, when I was testing them individually, I didn’t see any problem. However, when I ran the back up all databases one by one using mysqldump (i.e., running mysqldump command for each database, one after one, 100 times), something funny happened. Here is the error message:


#The system was running a brunch of mysqldump commands, one by one (not via background)

Got error: 1016: "Can't open file: './db_my_database/tbl_mytable.frm' (errno: 24)" when using LOCK TAB                                                                                                   LES
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces

At the mean time, I tried to access the database via MySQL terminal,

MariaDB [(none)]> SHOW DATABASES;
ERROR 1018 (HY000): Can't read dir of '.' (errno: 24)

This error message means the MySQL cannot access the file. If you google the message, you will notice that there are tons of solutions, and almost every of them suggests you to increase the open_files_limit variable in my.cnf.

Therefore, I checked my configurations (/etc/my.cnf), and I noticed that the value was already set to 30000. I also checked the lsof command and I found something very interesting. Notice that I have 100 database, each of them contains about 60 tables. Each table has about 3 files. Depending on the timeout settings, if all database and tables are opened, the total number of opened file will be 100x60x3 = 18,000

sudo lsof -u mysql | wc
1045   25811 239248

This result suggests that at the time of crashing, the mysql user (the system user that run the MariaDB service) was accessing 1045 files at the same time.

So I was scratching my head. Why I already set the open_files_limit value to 30000 already, and the system crashed at 1045th files? I also verified the memory (command: free) and current process (command: top), and I didn’t find anything unusual. One last thing, I checked the open_files_limit value using MySQL terminal, and this is what I found:

MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| open_files_limit | 1024  |
+------------------+-------+

It seems that MariaDB didn’t honor the open_files_limit I set in config file, instead it uses the default one, which isn’t right. So after some investigations, I’ve noticed that RHEL 7 set up some security stuffs, such that you will need to set the open_file_limit variable at the system level rather than the application level. In the other words, whatever you put in the /etc/my.cnf, it won’t go through the security check at RHEL.

Here is how to set the equivalent open_files_limit at the system level:

sudo mkdir -p /etc/systemd/system/mariadb.service.d/
sudo nano /etc/systemd/system/mariadb.service.d/limits.conf
#Add the following, for me, I like to set the open_files_limit to 30000:
[Service]
LimitNOFILE=30000
sudo systemctl daemon-reload
sudo systemctl restart mariadb

I tried to rerun the command again and that’s what I got:

MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| open_files_limit | 30000 |
+------------------+-------+
1 row in set (0.00 sec)

That’s it! Did I save you from heart attack?

One of the biggest selling points of RHEL is the stability. When we upgraded from RHEL 6 to RHEL 7 (clean install), we expected that everything should work fine without too much modifications. Unfortunately, what I saw is a broken system. I really don’t expect that this happens in an enterprise class product.

Our sponsors:

CentOS/RHEL 6: No ZFS after upgrading the kernel

This article is mainly for CentOS 6, please visit here for CentOS 7.

After I upgraded the CentOS / RHEL system to the latest kernel, the ZFS failed to start. The system was unable to load the ZFS module, i.e., I could not access my data. Here are some error messages I found on the system:

#sudo zpool status
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
#sudo /sbin/modprobe zfs
FATAL: Error inserting zfs (/lib/modules/2.6.32-573.7.1.el6.x86_64/weak-updates/zfs.ko): Unknown symbol in module, or unknown parameter (see dmesg)
#dmesg
zfs: disagrees about version of symbol vn_openat
zfs: Unknown symbol vn_openat
zfs: disagrees about version of symbol taskq_dispatch_delay
zfs: Unknown symbol taskq_dispatch_delay
zfs: disagrees about version of symbol taskq_cancel_id
zfs: Unknown symbol taskq_cancel_id
zfs: disagrees about version of symbol vn_open
zfs: Unknown symbol vn_open
zfs: disagrees about version of symbol vn_remove
zfs: Unknown symbol vn_remove
zfs: disagrees about version of symbol taskq_dispatch_ent
zfs: Unknown symbol taskq_dispatch_ent
zfs: disagrees about version of symbol taskq_dispatch
zfs: Unknown symbol taskq_dispatch
zfs: disagrees about version of symbol system_taskq
zfs: Unknown symbol system_taskq
zfs: disagrees about version of symbol taskq_wait
zfs: Unknown symbol taskq_wait
zfs: Unknown symbol __cv_wait_interruptible
zfs: disagrees about version of symbol taskq_wait_id
zfs: Unknown symbol taskq_wait_id
zfs: disagrees about version of symbol taskq_destroy
zfs: Unknown symbol taskq_destroy
zfs: disagrees about version of symbol vn_rdwr
zfs: Unknown symbol vn_rdwr
zfs: disagrees about version of symbol taskq_init_ent
zfs: Unknown symbol taskq_init_ent
zfs: disagrees about version of symbol taskq_create
zfs: Unknown symbol taskq_create
zfs: Unknown symbol __cv_timedwait_interruptible
zfs: disagrees about version of symbol taskq_member
zfs: Unknown symbol taskq_member

So what does these messages mean? Before I explain the details, let me explain how ZFS works on Linux. For legal reasons, unlike *BSD, Linux kernel does not support ZFS. In order to make Linux talks to ZFS, some people came up a very smart way: They inject the ZFS library at the kernel level, such that when Linux boots, it knows how to handle the ZFS. It sounds pretty ideal, isn’t it?

And now, we have a problem.

Many system administrators like to let the system upgrade automatically (such as running yum update -y in the cron job etc). Unlike *BSD, Linux bundles the kernel and application update together. In the other words, when you run the yum update, it will update both kernel and applications together, and there is no way for you to pick one and skip the other.

When the system upgrades the kernel, it refreshes everything, i.e., the new kernel will not know what is ZFS, because the process of injecting the ZFS happens when we install the ZFS on Linux. If there is no new version available, this process will not happen. So what happen after you reboot the computer, which by default, load the latest kernel? You got it, the ZFS won’t be loaded and your data is not accessible.

There are few ways to handle this. First, if you really want to keep your system up to dated (which I don’t recommend), exclude the kernel from the system update.

sudo nano /etc/yum.conf
[main]
.....
exclude=kernel*

It doesn’t mean your system is 100% safe from now on. You may still get some chances to break your ZFS. Here is some funny messages after I turn on the exclusion and run the yum update:

Loading new zfs-0.6.5.4 DKMS files...
Building for 2.6.32-504.23.4.el6.x86_64
Building initial module for 2.6.32-504.23.4.el6.x86_64
Done.

Adding any weak-modules
ERROR: modinfo: could not open /lib/modules/2.6.32-358.el6.x86_64/weak-updates/: Is a directory
ERROR: modinfo: could not open /lib/modules/2.6.32-504.23.4.el6.x86_64/zavl.ko: No such file or directory
FATAL: /lib/modules/2.6.32-504.23.4.el6.x86_64/zavl.ko: No such file or directory
Warning: Module zavl.ko from kernel  has no modversions, so it cannot be reused for kernel 2.6.32-358.el6.x86_64
ERROR: modinfo: could not open /lib/modules/2.6.32-358.el6.x86_64/weak-updates/: Is a directory
ERROR: modinfo: could not open /lib/modules/2.6.32-504.23.4.el6.x86_64/znvpair.ko: No such file or directory
FATAL: /lib/modules/2.6.32-504.23.4.el6.x86_64/znvpair.ko: No such file or directory
Warning: Module znvpair.ko from kernel  has no modversions, so it cannot be reused for kernel 2.6.32-358.el6.x86_64
ERROR: modinfo: could not open /lib/modules/2.6.32-358.el6.x86_64/weak-updates/: Is a directory
ERROR: modinfo: could not open /lib/modules/2.6.32-504.23.4.el6.x86_64/zunicode.ko: No such file or directory
FATAL: /lib/modules/2.6.32-504.23.4.el6.x86_64/zunicode.ko: No such file or directory
Warning: Module zunicode.ko from kernel  has no modversions, so it cannot be reused for kernel 2.6.32-358.el6.x86_64
ERROR: modinfo: could not open /lib/modules/2.6.32-358.el6.x86_64/weak-updates/: Is a directory
ERROR: modinfo: could not open /lib/modules/2.6.32-504.23.4.el6.x86_64/zcommon.ko: No such file or directory
FATAL: /lib/modules/2.6.32-504.23.4.el6.x86_64/zcommon.ko: No such file or directory
Warning: Module zcommon.ko from kernel  has no modversions, so it cannot be reused for kernel 2.6.32-358.el6.x86_64
ERROR: modinfo: could not open /lib/modules/2.6.32-358.el6.x86_64/weak-updates/: Is a directory
ERROR: modinfo: could not open /lib/modules/2.6.32-504.23.4.el6.x86_64/zpios.ko: No such file or directory
FATAL: /lib/modules/2.6.32-504.23.4.el6.x86_64/zpios.ko: No such file or directory
Warning: Module zpios.ko from kernel  has no modversions, so it cannot be reused for kernel 2.6.32-358.el6.x86_64

depmod...

DKMS: install completed.

The second thing you will need to do is to increase the /boot partition from the default 200MB to at least 2GB. By default, RHEL will create a 200MB /boot for storing the kernel files. Kernels are small and they rarely go beyond 40MB. However, RHEL will only keep up to 5 recent kernels (40MB x 5 = 200MB), and it will remove the rest. So what happen if it removes the one that works with ZFS? The only thing you can do is to reinstall the system and import your ZFS again.

sudo zpool import

Here is how to modify the number:

sudo nano /etc/yum.conf 
#Tell the system to keep the most 20 recent kernels
installonly_limit=20

Another thing you may want to do is to select the working kernel (instead of the latest) one when boot. Here is how to change it:

sudo nano /boot/grub/grub.conf

Notice that I comment out the most recent kernels:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda3
#          initrd /initrd-[generic-]version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
#title CentOS (2.6.32-573.7.1.el6.x86_64)
#       root (hd0,0)
#       kernel /vmlinuz-2.6.32-573.7.1.el6.x86_64 ro root=UUID=325cc438-33a6-46ae-8f1a-443ebd77c70a rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=pc$
#       initrd /initramfs-2.6.32-573.7.1.el6.x86_64.img
#title CentOS (2.6.32-573.8.1.el6.x86_64)
#       root (hd0,0)
#       kernel /vmlinuz-2.6.32-573.8.1.el6.x86_64 ro root=UUID=325cc438-33a6-46ae-8f1a-443ebd77c70a rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=pc$
#       initrd /initramfs-2.6.32-573.8.1.el6.x86_64.img
#title CentOS (2.6.32-573.12.1.el6.x86_64)
#       root (hd0,0)
#       kernel /vmlinuz-2.6.32-573.12.1.el6.x86_64 ro root=UUID=325cc438-33a6-46ae-8f1a-443ebd77c70a rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=p$
#       initrd /initramfs-2.6.32-573.12.1.el6.x86_64.img
#title CentOS (2.6.32-573.18.1.el6.x86_64)
#       root (hd0,0)
#       kernel /vmlinuz-2.6.32-573.18.1.el6.x86_64 ro root=UUID=325cc438-33a6-46ae-8f1a-443ebd77c70a rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=p$
#       initrd /initramfs-2.6.32-573.18.1.el6.x86_64.img
title CentOS (2.6.32-573.3.1.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-573.3.1.el6.x86_64 ro root=UUID=325cc438-33a6-46ae-8f1a-443ebd77c70a rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=pc$
        initrd /initramfs-2.6.32-573.3.1.el6.x86_64.img
title CentOS (2.6.32-573.1.1.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-573.1.1.el6.x86_64 ro root=UUID=325cc438-33a6-46ae-8f1a-443ebd77c70a rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=pc$
        initrd /initramfs-2.6.32-573.1.1.el6.x86_64.img
title CentOS (2.6.32-504.30.3.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-504.30.3.el6.x86_64 ro root=UUID=325cc438-33a6-46ae-8f1a-443ebd77c70a rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=128M  KEYBOARDTYPE=p$
        initrd /initramfs-2.6.32-504.30.3.el6.x86_64.img

Do not bother to remove the ZFS libraries and reinstall them again. It won’t work and it will make you system only more messy.

That’s it! Hope this tutorial saves you from heart attack.

–Derrick

Our sponsors:

Rsync with space in the directory name

I was looking for a way to include a directory name with space using rsync. So I google it, found a brunch of suggestions, which turned out that NONE of them really works. As always, I ended up coming the solution on my own. Here is how I did it:

rsync -avr 'username@remotehost:/directory\ abc/' /directory\ abc/

That’s it. Enjoy!

Our sponsors:

Running ZFS on Linux: Things you should know and be aware of

ZFS is the next generation file system. Unfortunately, it won’t be shipped with Linux because of legal/licensing issues. Fortunately, it is possible to install it (ZFS on Linux) in few commands. Since 2013, I have set up a number of Linux (CentOS/RHEL) servers with ZFS for use in a high traffic production environment. They include high-end commercial grade server (Xeon-based + ECC memory), gaming quality desktop (i7-based) and entry-level consumer grade computer (i3). In this article, I will discuss about what I have learned from my experience.

Warning on ZFS on Linux

ZFS on Linux is not a robust solution to implement ZFS on Linux because it has a very important (and impossible) requirement: The system will never get updated and rebooted. If you cannot meet this requirement (obviously), be prepare to spend tons of hours to fix the problem and get your data back. See how I fix the problem created by ZFS on Linux here. If you prefer rock solid and reliable way, you have to go with *BSD or Solaris.

Summary

Life is short. If you don’t want to waste your time to go through the entire article, here is my advice: Use FreeBSD (or *BSD) if possible. Using ZFS on Linux is like putting a giraffe in the wild Alaska. It is not going to work. However, we may want to stick with one operating system for server for various reasons. Therefore, I’ve come up some advice for you if you really want to run ZFS on Linux:

  • Use a commercial grade server when it is possible. A bare-bone entry-level Dell Power Edge T110 II (starting from US$300) is sufficient to run ZFS as a low traffic, light load, nightly backup server. Consumer grade computer is not recommended for use in ZFS/Linux. If you really need one, get a computer with gaming quality grade components and always back up the data on a different server.
  • Linux kernel plays an important role to ZFS. Try to use v.3 (e.g., RHEL 7) when possible. Using ZFS with v. 2.6 (e.g., RHEL 6) may cause some unexpected problem to non-commerical grade hardware. As of October 2019, I cannot make version 4 (e.g., install via kernel-ml or CentOS 8) works with ZFS on RHEL 7:
    Loading new spl-0.6.5.9 DKMS files...
    Building for 4.11.2-1.el7.elrepo.x86_64
    Building initial module for 4.11.2-1.el7.elrepo.x86_64
    configure: error: unknown
    Error! Bad return status for module build on kernel: 4.11.2-1.el7.elrepo.x86_64 (x86_64)
    Consult /var/lib/dkms/spl/0.6.5.9/build/make.log for more information.
    
  • Set up your ZFS with the hard drive identifier (e.g., /dev/disk/by-id/someid), not the generic device id (e.g., /dev/sda).
  • You may lose some storage space (smaller than 1%) comparing to the same setup in FreeBSD. But the amount is trivial.
  • If you already install ZFS on Linux, try to exclude the kernel from system update. The system will not load the ZFS after reboot, and it will take some extra work to get ZFS running again.
  • Some Linux distributions such as CentOS 7 will not load ZFS at the boot time. You can solve this problem by using cron job. If you have other services (e.g., MySQL, NFS, Apache) that depends on the ZFS, you will need to restart them.
  • Bookmark this ZFS emergency recovery guide. Trust me, you never know when your ZFS on Linux decide to stop working.

Do not update the kernel automatically

I’ve wrote an article on how to rescue your ZFS file system after updating the kernel. Please click here for details.

ZFS is not native in Linux. The whole idea of ZFS on Linux is nothing more than a brunch of modules being injected to the kernel, such that the kernel will load the ZFS at boot time. This is a fantastic idea because it will not introduce the performance problem like ZFS/FUSE (running on the user land, i.e., very slow). However, there is a potential problem here. This “injection” only happens when a ZFS module (zfs-kmod) is needed to be installed or updated. During this process, the system will download the latest copy of the zfs-kmod and injecting it to the current running kernel. See the problem here?

That being said, running root (/) on ZFS in Linux is a very very bad idea. You will not be able to access anything when the ZFS is not available at the kernel level.

So we have four different situations here after hitting the update command:

Kernel has new update
Kernel has no update

zfs-kmod has new update
Yes. Your ZFS will be available after the reboot.
Yes. Your ZFS will be available after the reboot.

zfs-kmod has no update
No. Your ZFS will not be available after the reboot.
Yes. Your ZFS will be available after the reboot

In general, if you really need to update the kernel, you will need to update the kernel first, reboot to the new kernel (ZFS will be missing), and re-run the process such that ZFS module will be injected to the new kernel. Some people may recommend to uninstall the zfs-kmod and reinstall it again. Unless you have a very strong reason to use the latest kernel (e.g., you’ve got plenty of spare time), otherwise I won’t recommend doing it because the whole process is a pain.

Another thing you can do is to disable the auto update. Only update the system when there is a new update for both kernel and the zfs-kmod. Then you can update the kernel first, reboot, and then update the zfs-kmod after the reboot. However, keep in mind that you will run into some problem eventually. Many packages depend on the newer kernel, if you try to update the system, it will complaint because you will need to update the kernel first before updating those packages. You can get around by skipping the broken packages (yum update –skip-broekn).

In my settings, I simply exclude the kernel from the update. That way I only need to work with one kernel, and I know that that particular kernel knows how to handle ZFS module.

sudo nano /etc/yum.conf 

exclude=kernel*

In case you are running into trouble, i.e., ZFS is missing in the latest kernel, you can try doing the following:

Before running the following commands, make sure that you know what you are doing.


#Make sure that you reboot to the kernel you want to fix.
#Find out what is the current kernel
uname -a
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

#In my example, it is:
3.10.0-514.2.2.el7.x86_64


#Basically we want to remove the following files:
ls -al /lib/modules/your_new_kernel/extra
-rw-r--r-- 1 root root 344K Dec 12 15:58 splat.ko
-rw-r--r-- 1 root root 167K Dec 12 15:58 spl.ko
-rw-r--r-- 1 root root  14K Dec 12 16:02 zavl.ko
-rw-r--r-- 1 root root  75K Dec 12 16:02 zcommon.ko
-rw-r--r-- 1 root root 2.2M Dec 12 16:02 zfs.ko
-rw-r--r-- 1 root root 130K Dec 12 16:02 znvpair.ko
-rw-r--r-- 1 root root  34K Dec 12 16:02 zpios.ko
-rw-r--r-- 1 root root 324K Dec 12 16:02 zunicode.ko

#If you have no extra modules installed other than ZFS and SPL, you can run the following:
sudo rm -Rf /lib/modules/*/extra/* 

#Otherwise just remove the files one by one.


#And we want to do the same thing to the weak-updates.
ls -al /lib/modules/your_new_kernel/weak-updates

drwxr-xr-x. 2 root root 4.0K Sep 16 10:58 .
drwxr-xr-x. 7 root root 4.0K Sep 16 10:58 ..
lrwxrwxrwx  1 root root   54 Sep 16 10:56 splat.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/splat.ko
lrwxrwxrwx  1 root root   52 Sep 16 10:56 spl.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/spl.ko
lrwxrwxrwx  1 root root   53 Feb 22  2016 zavl.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zavl.ko
lrwxrwxrwx  1 root root   56 Feb 22  2016 zcommon.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zcommon.ko
lrwxrwxrwx  1 root root   52 Sep 16 10:58 zfs.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zfs.ko
lrwxrwxrwx  1 root root   56 Sep 16 10:58 znvpair.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/znvpair.ko
lrwxrwxrwx  1 root root   54 Feb 22  2016 zpios.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zpios.ko
lrwxrwxrwx  1 root root   57 Feb 22  2016 zunicode.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zunicode.ko



#If you have no extra modules installed other than ZFS and SPL, you can run the following:
sudo rm -Rf /lib/modules/*/weak-updates/*


#Otherwise just remove the files one by one.


#Now, let's get into the fun part. We will remove them and reinstall them.
#Don't forget to match your version.
sudo dkms remove zfs/0.6.5.8 --all
sudo dkms remove spl/0.6.5.8 --all
sudo dkms --force install spl/0.6.5.8
sudo dkms --force install zfs/0.6.5.8

And we will verify the result.

#sudo dkms status
spl, 0.6.5.8, 3.10.0-514.2.2.el7.x86_64, x86_64: installed
zfs, 0.6.5.8, 3.10.0-514.2.2.el7.x86_64, x86_64: installed
zfs, 0.6.5.8, 3.10.0-327.28.3.el7.x86_64, x86_64: installed-weak from 3.10.0-514.2.2.el7.x86_64

The Kernel Version Matters

The kernel version does matter, and I will avoid using version 2.6 or below if you don’t have a professional grade hardware, such as Xeon CPU. Here is my comment:

Hardware
Linux Kernel (v.2.6)
Linux Kernel (v.3)
FreeBSD
9 & 10

Dell Power Edge T100 II
(Intel Xeon E3-1240 V2, 8GB memory, US$250)
Stable
Stable
Stable

Dell Power Edge T320
(Intel Xeon E5-2430, 64GB memory, US$2,000)
Stable
Stable
Stable

Gaming Quality Desktop
(Intel i7-4770, 32GB memory, US$900)
Unstable
Stable
Stable

Consumer Grade Desktop
(Intel i3-540, 8GB memory, US$500)
Unstable
Stable
Stable

However, it doesn’t mean that you should always use the latest kernel. Remember one thing, always keep a copy of the previous kernel before switching to the latest one. You never know whether ZFS will work with the latest one or not. For example, I have a big trouble to get ZFS working with 2.6.32-573.7.1.el6.x86_64, which is the latest kernel available on CentOS 6.7 (as of Oct 26, 2015). I ended up switching the system to 2.6.32-573.3.1.el6.x86_64 (-1 kernel). So always test the system before making the switch.

The Hard Drive Identifier

Set up your ZFS with the unique, non-changeable hard drive identifier (e.g., /dev/disk/by-id/wwn-0x1234c567890d0aaa). Do not use the generic device id (e.g., /dev/sda). When you reboot the system, the generic device id (/dev/sda) may get changed. This will be a problem to the ZFS.

For example, when RHEL 7 names the hard drive, it will name the hard drives that are attached directly to the motherboard first, these includes USB flash drives, SD card etc. After that, it will name the hard drives that are attached to the PCIe raid card. When you boot the computer with a USB flash drive attached, and if the USB flash drive was not available at the time you set up the ZFS, this small change is good enough to mess up your ZFS.

Here is an example:

History for 'storage':
zpool create -f storage raidz /dev/disk/by-id/wwn-0x5000c500206e46d4 \
                              /dev/disk/by-id/wwn-0x5000c500205eba0d \
                              /dev/disk/by-id/wwn-0x50014ee25a9074e2 \
                              /dev/disk/by-id/wwn-0x50024e9001c19fb2

So far I only noticed this problem with low-end / consumer grade motherboard. However, this is not a problem with FreeBSD because it is smart enough to re-map the old values.

The Stability

For some odd reasons, the ZFS will be unstable or even unavailable when the I/O is heavy:

  pool: storage
 state: DEGRADED
  scan: none requested
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
config:

        NAME                        STATE     READ WRITE CKSUM
        storage                     DEGRADED     0     0     0
          raidz1-0                  DEGRADED     0     0     0
            wwn-0x5000c500206e46d4  ONLINE       0     0     0
            wwn-0x5000c500205eba0d  ONLINE       0     0     0
            wwn-0x50014ee25a9074e2  ONLINE       0     0     0
            wwn-0x50024e9001c19fb2  UNAVAIL      0     0     0

This kind of problem happens mainly with low-end consumer grade computer with older kernel. Once I upgraded the kernel to a newer version, the problem is gone. No hardware change is needed. Again, I’ve never experienced this kind of problem since FreeBSD 9. The only explanation I can think of is the older Linux Kernel does not support ZFS and low-end computer very well.

Load ZFS at Boot

Some Linux variants such as CentOS 7 will not load ZFS at boot (in my case, my kernel is 3.10.0-327.28.3.el7.x86_64). I choose to run the ZFS via cron job. What if the ZFS contains the files that are required by some service, e.g, your database or web server files are on ZFS? You will need to restart those services after loading the ZFS. Here is an example:

sudo nano /etc/crontab

#Example 1: Load all available ZFS pools
@reboot         root    sleep 20; zpool import -a;

#Example 2: Load all ZFS pools first, then restart the Apache, MySQL and NFS services
@reboot         root    sleep 20; zpool import -a; sleep 15; systemctl restart httpd.service && systemctl restart mariadb.service && systemctl restart nfs-server;

Good luck!

–Derrick

Our sponsors:

FreeBSD or Linux in 6 Simple Questions

FreeBSD or Linux

FreeBSD or Linux (Ubuntu/RHEL)? This is a very old question. It’s like asking iPhone or Android. There is no short answer. It all depends on your situation. To make things easier, I am going to break it down into six simple questions to help you making decision.

My Background
I have been a FreeBSD user since 2003. My usage on FreeBSD mainly on service-oriented stuffs such as web farm, database clusters and file system etc. In short, I mainly use my FreeBSD system via command line. In 2009, I jumped into Linux world (Ubuntu/RHEL) because of my job. As an advanced user on both Operating Systems (FreeBSD and Ubuntu/RHEL Linux), here is my guide on these two systems.

There is only one FreeBSD, but there are many different variants of Linux. The Linux I mention below refers to two popular distributions: RHEL and Ubuntu.

FreeBSD vs Linux: Q.1 How do you describe yourself?

I am a very demanding person. I like to control everything I manage. –> FreeBSD
I don’t care about how a system is run. I am okay as long as it just works. –> Linux / Ubuntu / RHEL

Comment:
FreeBSD gives you freedom to control every single thing. One of its coolest feature is the port tree. You can build every application from source using port tree. In Linux world, you usually install applications from pre-built/pre-compiled package (yum, apt-get etc), which may not be exactly what you need.

Example:
In RHEL and Ubuntu, HTTP load balancer module does not come with Apache by default. You will need to compile Apache from source. What about in FreeBSD? All you need is to check a box (pretty much like shopping cart) and you are done.

FreeBSD vs Linux: Q.2 Do you prefer Ferrari or Hyundai?

Ferrari / BMW / SLR Camera –> FreeBSD
Toyota / Hyundai / Point and Shoot Camera / Phone Camera –> Linux / Ubuntu / RHEL

Comment:
The technology uses by FreeBSD such as kernel, file system, architecture etc are way better and more advanced than Linux. It’s like comparing Ferrari and Hyundai (And no, I am not kidding).

Example 1:
I need to run some extreme applications (e.g., DNA Sequence Alignment) which use all available threads and memory. The default settings of memory management in Linux is very poor. Every time I run my application, the system becomes not usable to other users. However, FreeBSD does not have this server problem. In fact, FreeBSD is smart enough not to cause the system to freeze. Of course you can tweak the memory management settings in Linux, or even run the command along with the nice command. However, these settings are just not available out of the box, and most of the time, you learn these tricks after your system has problem.

Example 2:
I installed FreeBSD 11 and RHEL 7 on two identical computers respectively. Both use default settings and enable similar services, such as enabling the SSH server and disable booting to X-Windows (RHEL). I used these two machines for exact the same purposes: SSH tunneling, with exact the same work load (evenly distributed). After using them for a month, I checked the memory usage. With FreeBSD, the available memory is about 800MB (out of 1GB), while there are only 200MB left (out of 1GB) on RHEL. Yes, Linux (at least with RHEL 7) consumes lots of memory.

Example 3:
FreeBSD comes with ZFS (The next generation file system) by default. Although it has been ported to Linux world, it is definitely unstable. We’ve tried to use it in a production environment. One thing that we’ve learned is that the ZFS may stop working after upgrading to a newer Linux kernel. I’ve received countless of email alert about missing the files (ZFS is not working, of course the files are gone) in the middle of the night. I ended up disable the auto update and disable the reboot after the update. Sounds familiar? That’s a feature in Windows, and for some odd reasons, this feature is available in Linux, a server.

Example 4:
LVM+RAID is an the most advanced storage method in Linux world. Unfortunately it does not do what it promises, i.e., you may lose your data if a hard drive is failure, even if you follow its directions to detach your failed hard drive correctly. Not to mention that the data will get corrupted if the power is down (which has been taken care of in ZFS, available in FreeBSD).

That’s why I prefer ZFS over LVM+RAID here: Building a Super Large and Reliable File Server with Mixed Size of Harddisks. It solves my problem (yes, even the power is failed during writing to the disk, my data is still safe!)

FreeBSD vs Linux: Q.3 Do you have lots of free time?

Yes: FreeBSD
No: Linux / Ubuntu / RHEL

Comment:
Making a production-ready system using FreeBSD can take you days to weeks if you are not an experienced FreeBSD user, while everything works out of the box in Linux. Sometimes, the new upgrade from the port tree can drive you nut, such as package conflicts etc. However, working with Linux is a leisure thing.

Example:
Installing Apache + MySQL + PHP from FreeBSD port (compiling the source) can take at least half day on a computer with a dual core CPU (AMD Athlon 64 X2 Dual Core Processor 4600+), while it takes less than 30 minutes on Linux. That’s because you need to compile the code from source in FreeBSD, while you simply download the packages and extract them in Linux. The time difference is huge.

FreeBSD vs Linux: Q.4 Do you prefer simplicity or complexity?

Adding sugar into water –> FreeBSD
Taking sugar away from soda–> Linux / Ubuntu / RHEL

Comment:
FreeBSD is a very very simple system. Think of it as a bare-bone system that comes with no junk. By default, it comes with no graphic user interface, no unnecessary application. It likes a pure distilled water.

In Ubuntu / RHEL, everything is configured and ready to use. It comes with very attractive, beautiful graphical user interface. Everything just works out of the box, no tuning or tweaking is required. However, it also comes with lots of junk such as Ubuntu One (For Ubuntu Cloud service), SELinux (A program developed by Red Hat where many people don’t use it), etc. It makes your system very bulky and increase the computation resource consumption.

FreeBSD vs Linux: Q.5 Are you going to use the computer as desktop?

Yes: Linux / Ubuntu / RHEL
No: FreeBSD

Comment:
Setting up a desktop-ready system on FreeBSD can take a long time. Main reason is the drivers availability. A lot of hardware such as graphic cards, audio cards or webcams are not supported in FreeBSD natively. If you want to get them working as good as on Windows / Ubuntu, you will need to get the driver first (if available), build it (which may gives error during compiling the codes), and recompile the kernel to make it supports the new driver, which can take few days if you are not experienced with FreeBSD and debugging driver.

On the other side, the Linux driver community is very strong and well developed. Usually they develop drivers for most popular hardware.

Example:

#1: My Logitech Orbit MP webcam (Pan / Tilt / Zoom) is not working on FreeBSD but works like a charm on Linux.

#2: Some vendors such as Highpoint may stop developing drivers for their products for newest version of FreeBSD.

FreeBSD vs Linux: Q.6 Do you need to blame someone when something goes wrong?

Yes: Linux / Ubuntu / RHEL
No: FreeBSD

Comment:

When something goes wrong, you can blame Linux and you can’t blame FreeBSD.

FreeBSD is a community driven operating system, while some Linux distributions such as Ubuntu and RHEL are backed by commercial vendors. In the world of FreeBSD, it is not uncommon that some unskilled developers introduce bugs, troubles to the rest of the world. Yes, we know it is free, so we can’t complaint about it. However, as an IT administrator, you will need to use your judgement to judge whether those new stuffs are safe to use or not. In short, that will increase your workload. It’s more like Windows update. How many people actually read the change log before hitting the update button?

In Linux (e.g., RHEL), that’s a whole different story. Every patches, new updates have been screened by the vendor before hitting to public. So you can trust them in some degree. And the key thing is, you can blame them when something goes wrong.

Conclusion

In short, use FreeBSD for your personal purpose while Linux for your work. Be the top 5%, not the bottom 95%.

Our sponsors:

CentOS 6: No Networking Connections After Upgrade

Today I reboot my CentOS 6 server, and I realized that the network connection was lost after the upgrade. To be exact, it seems that the problem was caused by the new kernel: 2.6.32-573.1.1.el6.x86_64. It modified the network settings of the server with manual settings (server with DHCP is not affected). Here is how I fix the problem (You will need physical access to the server):

I have noticed that the adapter profile has been modified to something that doesn’t make scenes. If you compare the network settings, you will notice the following differences:

#Before the upgrade
#cat /etc/sysconfig/network-scripts/ifcfg-em1  
PREFIX=24
#After the upgrade
#cat /etc/sysconfig/network-scripts/ifcfg-em1  
PREFIX=32

So I simply make the modifications to the adapter settings and restart the network service, i.e.,

sudo service network restart

And the network connection is back!

That’s it! Hope this tutorial saves you from heart attack.

–Derrick

Our sponsors:

[FreeBSD]Upgrade PHP 5.5 to 5.6

It is not easy to upgrade PHP 5.5 to 5.6 in FreeBSD. Without proper preparation, the upgrade process may drive you nut. Before you decide to get your hands wet, here are what I recommend you to do:

  1. Back up your files
  2. Test your website in a PHP 5.6 environment on a different server. It is because PHP 5.6 has introduced some backward incompatibilities. Some of the codes written in the prior versions may introduce run time error. See here for more information.
  3. Schedule a down time. Depending on your CPU speed / typing speed / trouble-shooting skill, it may take you an hour.

Background

I am assuming that you use PHP for web purposes (rather than command line / CLI only), and I am assuming that you are using PHP with Apache. Here are the ports you will need to touch:

  • Apache: /usr/ports/www/apache22 or /usr/ports/www/apache24
  • Apache-PHP: /usr/ports/www/mod_php56
  • PHP: /usr/ports/lang/php56
  • PHP Extensions: /usr/ports/lang/php56-extensions

1. Remove the old PHP and extensions

cd /usr/ports/lang/php55
sudo make deinstall clean


cd /usr/ports/lang/php55-extensions
sudo make deinstall clean

2. Install PHP 5.6

cd /usr/ports/lang/php56

#Don't forget enable ZTS if you have threaded Apache.
sudo make install clean

3. Install PHP 5.6 Extensions

cd /usr/ports/lang/php56-extensions
sudo make install clean

4. Test PHP and its extensions

php -v
php -m

Clean up the error by removing the duplicated entries in:
/usr/local/etc/php/extensions.ini

5. Rebuild the Apache-PHP Bridge

cd /usr/ports/www/mod_php55
sudo make deinstall clean

cd /usr/ports/www/mod_php56
#Don't forget enable ZTS if you have threaded Apache.
sudo make install clean

6. Restart Apache

sudo /usr/local/etc/rc.d/apache restart

7. Test PHP using phpinfo

Create a code called test.php to display phpinfo. Verify that everything is okay.

< ?php
phpinfo();
?>

8. Reinstall Apache (optional)

If you experience any problem, try to reinstall the following ports:

Apache: /usr/ports/www/apache22 or /usr/ports/www/apache24
Apache-PHP: /usr/ports/www/mod_php56

That’s it! Enjoy the new PHP!

–Derrick

Our sponsors:

MySQL Random Error: ERROR 2013 (HY000): Lost connection to MySQL server at ‘reading authorization packet’, system error: 0

Recently, I am experiencing a weird error when connecting to a MySQL server remotely:

ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0

Basically, this error is similar to the busy tone when you are making calls. The key thing is, it happens randomly. Sometimes the connection is okay, sometimes it takes any where from 0.01 to 30 seconds to establish a connection. Sometimes it gets time-out.

Long story short. Continue to read this article if you have met the following conditions:

  • You try to connect to a MySQL server remotely, i.e., not localhost(127.0.0.1)
  • It happens randomly. It can take anywhere from 0.01 seconds to 30 seconds to establish a connection. Sometimes it fails.
  • You connect the server using IP address, i.e., it has nothing to do with the domain name, or skip-name-resolve in my.cnf
  • You have included the client IP address in /etc/hosts.allow.

The key thing is: Random.

You probably have scratched your head for few hours (or days), gone through tons of useless suggestions on Google/Stackoverflow/Serverfault etc, and the problem still exists. Oh well, at least this has been happened on me in the past 24 hours.

Before we discussed the problem, let’s try to reproduce the problem:

#In the client computer, we try to connect to 
#the MySQL database remotely and run a simply command:
time mysql -u root -pPASSWORD -h IP_ADDRESS -e "show databases;"

#Case 1: Everything is okay
real    0m0.001s
user    0m0.001s
sys     0m0.001s


#Case 2: it takes 20 seconds to establish a connection. 
#That's not right.
real    0m20.001s
user    0m0.003s
sys     0m0.003s

#Case 3: Cannot even make the connection.
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0
real    0m49.617s
user    0m0.003s
sys     0m0.003s

If you also observe a similar symptoms, I can tell you that the problem may not be related to MySQL server or MySQL settings. I recommend you to check your network traffic. Here are my suggestions:

#Check which process is running.
top

#Or you can check which process is running by the web server user
#In my case, apache is the web server user
ps -u apache

#Or you can check the current traffic using nload
nload -u M

If you are lucky, you may notice that there is a huge network traffic going on. The traffic is the main problem that cause the problem. Try to kill that process or perform a reboot.

Let’s take my case an example. I noticed a weird process running by the apache user:

ps -u apache

  PID TTY          TIME CMD
 8112 ?        00:00:09 httpd
 8113 ?        00:00:08 httpd
 8334 ?        00:00:08 httpd
 8796 ?        00:00:06 httpd
 8802 ?        00:00:07 httpd
 8891 ?        00:00:07 something (This is a malware)

After I kill that process, everything is back to normal again.

–Derrick

Our sponsors:

ZFS Performance: Mirror VS RAIDZ VS RAIDZ2 vs RAIDZ3 vs Striped

I always wanted to find out the performance difference among different ZFS types, such as mirror, RAIDZ, RAIDZ2, RAIDZ3, Striped, two RAIDZ vdevs vs one RAIDZ2 vdev etc. So I decide to create an experiment to test these ZFS types. Before we talk about the test result, let’s go over some background information, such as the details of each design and the hardware information.

Background

Here is a machine I used for experiment. It is a consumer grade desktop computer manufactured back in 2014:

CPU: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz / quard cores / 8 threads
OS: CentOS Linux release 7.3.1611 (Core)
Kernel: Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Memory: 20 GB (2GB x 4)
Hard drives: 5 TB x 8 
(Every hard drive is 4k sectors, non-SSD, consumer grade, connected via a PCI-e x 16 raid card with SAS interface)
System Settings: Everything is system default. Nothing has been done to the kernel configuration.

Also, I tried to keep each test simple. Therefore I didn’t do anything special:

zpool create -f myzpool (different settings go here...)
zfs create myzpool/data

To optimize the I/O performance, the block size of the zpool is based on the physical sector of the hard drive. In my case, all of the hard drives have 4k (4096 bytes) sectors, which is translated to 2^12, therefore, the ashift value of the zpool is 12.

zdb | grep ashift
ashift: 12

To measure the write performance, I first generate a zero based file with the size of 41GB and output to the zpool directly. To measure the read performance, I read the file and output to /dev/null. Notice that the file size is very large (41GB) such that it does not fit in the arc cache memory (50% of the system memory, i.e., 10GB). Notice that the block size is the physical sector of the hard drive.

One of the readers asked me why I use a large file instead of many small files. There are few reasons:

  • It is very easy to stress test / saturate the bandwidth (connection in between the hard drives, network etc) when working with large file.
  • The results of testing large files is more consistent.
#To test the write performance:
dd if=/dev/zero of=/myzpool/data/file.out bs=4096 count=10000000

#To test the read performance:
dd if=/myzpool/data/file.out of=/dev/null bs=4096

FYI, if the block size is not specified, the result can be very different:

#Using default block size:
dd if=/myzpool/data/file.out of=/dev/null
40960000000 bytes (41 GB) copied, 163.046 s, 251 MB/s

#Using native block size:
dd if=/myzpool/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 58.111 s, 705 MB/s

After each test, I destroyed the zpool and created a different one. This ensures that the environment factors (such as hardware and OS) stay the same. Here is the test result. If you want to learn more about each design, such as the exact command I used for each test, the corresponding material will be available in the later section.

Notice that I used eight 5TiB hard drives (Total: 40TiB) in this test. Typically hard drive of 5TiB of can hold about 4.5 TB of data, that’s around 86%-90% of the advertised number, depending on which OS you are using. For example, if we use the striped design, which is the maximum possible storage capacity in ZFS, the usable space will be 8 x 5TiB x 90% = 36TB. Therefore, the following percentages will be based on 36TB rather than 40TiB.

You may notice that I use 10 disks in each diagram, while I use only 8 disks in the article here. That’s because the diagram was from my first edit. At that time I used a relative old machine, which may not reflect the modern ZFS design. The hardware and the test methods I used in the second edit is better, although both edits draw the same conclusion.

Test Result

(Sorted by speed)

No.
ZFS Type
(Click to see details)
Write Speed (MB/s)
Time Spent on Writing a 41GB File
Read Speed (MB/s)
Time Spent on Reading a 41GB File
Storage Capacity (Max: 36TB)
# of Disks Used On Data Parity
Disk Arrangement

705
58.111s
687
59.6386s
36TB (100%)
0
Striped (8)

670
61.1404s
680
60.2457s
26TB (72%)
2
RAIDZ (4) x 2

608
67.3897s
673
60.8205s
25TB (69%)
2
RAIDZ2 (8)

604
67.8107s
631
64.8782s
30TB (83%)
1
RAIDZ (8)

528
77.549s
577
70.9604s
21TB (58%)
3
RAIDZ3 (8)

473
86.6451s
598
68.4477s
18TB (50%)
4
Mirror (2) x 4

414
98.9698s
441
92.963s
18TB (50%)
4
RAIDZ(2) x 2

Striped

In this design, we use all disks to store data (i.e., zero data protection), which max out our total usable spaces to 36 TB.

#Command
zpool create -f myzpool hd1 hd2 \
                        hd3 hd4 \
                        hd5 hd6 \
                        hd7 hd8

#df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
myzpool         36T      0K     36T       0%  /myzpool 

#zpool status -v
        NAME        STATE     READ WRITE CKSUM
        myzpool     ONLINE       0     0     0
          hd1       ONLINE       0     0     0
          hd2       ONLINE       0     0     0
          hd3       ONLINE       0     0     0
          hd4       ONLINE       0     0     0
          hd5       ONLINE       0     0     0
          hd6       ONLINE       0     0     0
          hd7       ONLINE       0     0     0
          hd8       ONLINE       0     0     0

And here is the test result:

#Write Test
dd if=/myzpool/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 58.111 s, 705 MB/s

#Read Test
dd if=/myzpool/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 59.6386 s, 687 MB/s

RAIDZ x 2

In this design, we split the data into two groups. In each group, we store the data in a RAIDZ1 structure. This is similar to RAIDZ2 in terms of data protection, except that this design supports up to one failure disk in each group (local scale), while RAIDZ2 allows ANY two failure disks overall (global scale). Since we use two disks for parity purpose, the usable space drops from 36TB to 26TB.

#Command
zpool create -f myzpool raidz hd1 hd2 hd3 hd4 \
                        raidz hd5 hd6 hd7 hd8

#df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
myzpool         26T      0K     26T       0%  /myzpool 

#zpool status -v
        NAME        STATE     READ WRITE CKSUM
        myzpool     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            hd1     ONLINE       0     0     0
            hd2     ONLINE       0     0     0
            hd3     ONLINE       0     0     0
            hd4     ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            hd5     ONLINE       0     0     0
            hd6     ONLINE       0     0     0
            hd7     ONLINE       0     0     0
            hd8     ONLINE       0     0     0


And here is the test result:

#Write Test
dd if=/dev/zero of=/storage/data/file.out bs=4096 count=10000000
40960000000 bytes (41 GB) copied, 61.1401 s, 670 MB/s

#Read Test
dd if=/storage/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 60.2457 s, 680 MB/s


RAIDZ2

In this design, we use two disks for data protection. This allow up to two disks fail without losing any data. The usable space will drop from 36TB to 25TB.

#Command
zpool create -f myzpool raidz2 hd1 hd2 hd3 hd4 \
                               hd5 hd6 hd7 hd8

#df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
myzpool         25T     31K     25T       0%  /myzpool 

#zpool status -v
        NAME        STATE     READ WRITE CKSUM
        myzpool     ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            hd1     ONLINE       0     0     0
            hd2     ONLINE       0     0     0
            hd3     ONLINE       0     0     0
            hd4     ONLINE       0     0     0
            hd5     ONLINE       0     0     0
            hd6     ONLINE       0     0     0
            hd7     ONLINE       0     0     0
            hd8     ONLINE       0     0     0

And here is the test result:

#Write Test
dd if=/dev/zero of=/storage/data/file.out bs=4096 count=10000000
40960000000 bytes (41 GB) copied, 67.3897 s, 608 MB/s

#Read Test
dd if=/storage/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 60.8205 s, 673 MB/s

RAIDZ1

In this design, we use one disk for data protection. This allow up to one disk fails without losing any data. The usable space will drop from 36TB to 30TB.

#Command
zpool create -f myzpool raidz hd1 hd2 hd3 hd4 \
                              hd5 hd6 hd7 hd8

#df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
myzpool         30T      0K     30T       0%  /myzpool 

#zpool status -v
        NAME        STATE     READ WRITE CKSUM
        myzpool     ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            hd1     ONLINE       0     0     0
            hd2     ONLINE       0     0     0
            hd3     ONLINE       0     0     0
            hd4     ONLINE       0     0     0
            hd5     ONLINE       0     0     0
            hd6     ONLINE       0     0     0
            hd7     ONLINE       0     0     0
            hd8     ONLINE       0     0     0

And here is the test result:

#Write Test
dd if=/dev/zero of=/storage/data/file.out bs=4096 count=10000000
40960000000 bytes (41 GB) copied, 67.8107 s, 604 MB/s

#Read Test
dd if=/storage/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 64.8782 s, 631 MB/s


RAIDZ3

In this design, we use three disks for data protection. This allow up to three disks fail without losing any data. The usable space will drop from 36TB to 21TB.

#Command
zpool create -f myzpool raidz3 hd1 hd2 hd3 hd4 \
                               hd5 hd6 hd7 hd8

#df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
myzpool         21T     31K     21T     0%    /myzpool 

#zpool status -v
        NAME        STATE     READ WRITE CKSUM
        myzpool     ONLINE       0     0     0
          raidz3-0  ONLINE       0     0     0
            hd1     ONLINE       0     0     0
            hd2     ONLINE       0     0     0
            hd3     ONLINE       0     0     0
            hd4     ONLINE       0     0     0
            hd5     ONLINE       0     0     0
            hd6     ONLINE       0     0     0
            hd7     ONLINE       0     0     0
            hd8     ONLINE       0     0     0


And here is the test result:

#Write Test
dd if=/dev/zero of=/storage/data/file.out bs=4096 count=10000000
40960000000 bytes (41 GB) copied, 77.549 s, 528 MB/s

#Read Test
dd if=/storage/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 70.9604 s, 577 MB/s


Mirror

In this design, we use half of our disks for data protection, which makes our total usable spaces drop from 36 TB to 18 TB.

#Command
zpool create -f myzpool mirror hd1 hd2 \
                        mirror hd3 hd4 \
                        mirror hd5 hd6 \
                        mirror hd7 hd8

#df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
myzpool         18T     31K     18T       0%  /myzpool 

#zpool status -v
        NAME        STATE     READ WRITE CKSUM
        myzpool     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            hd1     ONLINE       0     0     0
            hd2     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            hd3     ONLINE       0     0     0
            hd4     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            hd5     ONLINE       0     0     0
            hd6     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            hd7     ONLINE       0     0     0
            hd8     ONLINE       0     0     0
          

And here is the test result:

#Write Test
dd if=/dev/zero of=/storage/data/file.out bs=4096 count=10000000
40960000000 bytes (41 GB) copied, 86.6451 s, 473 MB/s

#Read Test
dd if=/storage/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 68.4477 s, 598 MB/s


RAIDZ2 x 2

In this design, we split the data into two groups. In each group, we store the data in a RAIDZ2 structure. Since we use two disks for parity purpose, the usable space drops from 36TB to 18TB.

#Command
zpool create -f myzpool raidz2 hd1 hd2 hd3 hd4 \
                        raidz2 hd5 hd6 hd7 hd8

#df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
myzpool         18T      0K     18T       0%  /myzpool 

#zpool status -v
        NAME        STATE     READ WRITE CKSUM
        myzpool     ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            hd1     ONLINE       0     0     0
            hd2     ONLINE       0     0     0
            hd3     ONLINE       0     0     0
            hd4     ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            hd5     ONLINE       0     0     0
            hd6     ONLINE       0     0     0
            hd7     ONLINE       0     0     0
            hd8     ONLINE       0     0     0


And here is the test result:

#Write Test
dd if=/dev/zero of=/storage/data/file.out bs=4096 count=10000000
40960000000 bytes (41 GB) copied, 98.9698 s, 414 MB/s

#Read Test
dd if=/storage/data/file.out of=/dev/null bs=4096
40960000000 bytes (41 GB) copied, 92.963 s, 441 MB/s


Summary

I am not surprised that the striped layout offers the fastest writing speed and maximum storage space. The only drawback is zero data protection. Unless you mirror the data at the server level (e.g., Hadoop), or the data is not important, otherwise I won’t recommend you to use this design.

Personally I recommend to go with Striped RAIDZ, i.e., we try to make multiple RAIDZ vdev, and each vdev has no more than 5 disks. In theory, ZFS recommends the number of disks in each vdev is no more than 8 to 9 disks. Based on my experience, ZFS will slow down when it has about 30% free space left if we have too many disks in one single vdev.

So which design you should use? Here is my recommendation:

#Do you care your data?
No: Go with striped.
Yes: See below:

#How many disks do you have?
1:     ZFS is not for you.
2:     Mirror
3-5:   RAIDZ1
6-10   RAIDZ1 x 2
10-15: RAIDZ1 x 3
16-20: RAIDZ1 x 4

And yes, you can pretty much forget about RAIDZ2, RAIDZ3 and mirror if you need speed and data protection together.

So, you may ask a question, what should I do if there are more than one hard drive fail? The answer is: You need to keep an eye on the health of your ZFS pool every day. I have been managing over 60 servers since 2009, and I’ve used only RAIDZ1 with my consumer level harddrives (most of them actually was taken from the external harddrives). So far I don’t have any data lost.

sudo zpool status -v

or

sudo zpool status -v | grep 'state: ONLINE'

Simply write a program to get the result from this command, and send yourself an email if there is anything go wrong. You can include the program in your cron job and have it run daily or hourly. This is my version:

#!/bin/bash

result=`sudo zpool status -x`

if [[ $result != 'all pools are healthy' ]]; then
        echo "Something is wrong."
        #Do something here such as send an email, such as sending an email via HTTP...
        /usr/bin/wget "http://example.com/send_email.php?subject=Alert&body=File%20System%20Has%20Problem" -O /dev/null > /dev/null
        exit 1;
fi

Enjoy ZFS.

–Derrick

Our sponsors: