CentOS/RHEL 7 – No ZFS After Updating the Kernel

Let’s all agree with this fact: ZFS is foreign to Linux. It is not native. You can’t expect that ZFS on Linux will run smoothly as FreeBSD or Solaris. Having using ZFS on Linux since 2013 (and ZFS on FreeBSD since 2009), I’ve noticed that ZFS does not like Linux (well, at least RHEL 7). Here are some few examples:

  • ZFS is not loaded at the boot time. You will need to manually start it or load it via cron. Good luck if you have other services (like Apache, MySQL, NFS, or even users’ home directories) that depend on the ZFS.
  • Every single time you update the kernel, ZFS will not work after the reboot without some manual work. What if the system runs the update automatically, and one day there is a power failure which makes your server to reboot to a new kernel? Your system will not be able to mount your ZFS volume. If you integrate ZFS with other service applications such as web, database or network drive, oh well, good luck and I hope you will catch this problem fast enough before receiving thousands of emails and calls from your end-users.
  • If you exclude the kernel from the updates (/etc/yum.conf), you will eventually run into trouble, because there are tons of other packages that require the latest kernel. In the other words, running the command: yum update -y will fail. You will need to run yum update –skip-broken, which means you will miss many latest packages. Here is an example:
    --> Finished Dependency Resolution
    Error: Package: hypervvssd-0-0.29.20160216git.el7.x86_64 (base)
               Requires: kernel >= 3.10.0-384.el7
               Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda)
                   kernel = 3.10.0-327.el7
               Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates)
                   kernel = 3.10.0-327.22.2.el7
    Error: Package: hypervfcopyd-0-0.29.20160216git.el7.x86_64 (base)
               Requires: kernel >= 3.10.0-384.el7
               Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda)
                   kernel = 3.10.0-327.el7
               Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates)
                   kernel = 3.10.0-327.22.2.el7
    Error: Package: hypervkvpd-0-0.29.20160216git.el7.x86_64 (base)
               Requires: kernel >= 3.10.0-384.el7
               Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda)
                   kernel = 3.10.0-327.el7
               Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates)
                   kernel = 3.10.0-327.22.2.el7
     You could try using --skip-broken to work around the problem
     You could try running: rpm -Va --nofiles --nodigest
    
  • If you are running the stable Linux distributions like RHEL 7, you can load a more recent kernel like 4.x by installing the package: kernel-ml. However, don’t expect that ZFS will work with version 4:
    Loading new spl-0.6.5.9 DKMS files...
    Building for 4.11.2-1.el7.elrepo.x86_64
    Building initial module for 4.11.2-1.el7.elrepo.x86_64
    configure: error: unknown
    Error! Bad return status for module build on kernel: 4.11.2-1.el7.elrepo.x86_64 (x86_64)
    Consult /var/lib/dkms/spl/0.6.5.9/build/make.log for more information.
    
    

Running ZFS on Linux is like putting a giraffe in the wild in Alaska. It is just not the right thing to do. Unfortunately, there are so many things that only available on Linux so we have to live with it. Just like FUSE (Filesystem in Userspace), many people feel hesitated to run their file systems on the userspace instead of kernel level, but hey, see how many people are happy with GlusterFS, a distributed file system that live on FUSE! Personally I just think it is not a right thing to do, especially in an enterprise environment. Running a production file system at the userspace level, seriously?

Anyway, if you are running into trouble after upgrading your Linux kernel (and you almost had a heart attack when you think your data may be lost), you have two choices:

  1. Simply boot to the previous working kernel if you need to get your data back in quick. However, keep in mind that this will create two problems:
    • Since you already update the system with the new kernel and the new packages, your new packages probably will not work with the old kernel, and that may give you extra headache.
    • Unless you manually overwrite the kernel boot order (boot loader config), otherwise you may get into the same trouble in the next boot.
  2. If you want a more “permanent” fix, you will need to rebuild the dkms ZFS and SPL modules. See below for the instructions. Keep in mind that you will have the same problem again when the kernel receives a new update.

You’ve tried to load the ZFS and realize that it is no longer available:

#sudo zpool import
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.

#sudo /sbin/modprobe zfs
modprobe: FATAL: Module zfs not found.

You may want to check the dkms status. Write down the version number. In my case, it is 0.6.5.9

#sudo dkms status
spl, 0.6.5.9, 3.10.0-327.28.3.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
spl, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
zfs, 0.6.5.9, 3.10.0-327.28.3.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)

Before running the following commands, make sure that you know what you are doing.


#Make sure that you reboot to the kernel you want to fix.
#Find out what is the current kernel
uname -a
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

#In my example, it is:
3.10.0-514.2.2.el7.x86_64

#Now, let's get into the fun part. We will remove them and reinstall them.
#Don't forget to match your version, in my base, my version is: 0.6.5.9
sudo dkms remove zfs/0.6.5.9 --all
sudo dkms remove spl/0.6.5.9 --all
sudo dkms --force install spl/0.6.5.9
sudo dkms --force install zfs/0.6.5.9

#or you can run these commands in one line, so that you don't need to wait:
sudo dkms remove zfs/0.6.5.9 --all; sudo dkms remove spl/0.6.5.9 --all; sudo dkms --force install spl/0.6.5.9; sudo dkms --force install zfs/0.6.5.9;

And we will verify the result.

#sudo dkms status
spl, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed
zfs, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed

Finally we can start the ZFS again.

sudo /sbin/modprobe zfs

Your ZFS pool should back. You can verify it by rebooting your machine. Notice that Linux may not automatically mount the ZFS volumes. You may want to mount it manually or via cron job.

Here is how to mount the ZFS volumes manually.

sudo zpool import -a

You may want to remove all of the old kernels too.

sudo package-cleanup --oldkernels --count=1 -y

Our sponsors: