Last Updated: May 10, 2020
This article is based on my experience with CentOS 7. If you are running other Linux distributions, please adjust the commands and package names accordingly (e.g., yum –> apt-get).
As of Oct 3, 2019, I cannot get ZFS on Linux running on CentOS 8.
ZFS on Linux is a not robust solution to get ZFS up and running in Linux environments. Unlike FreeBSD, ZFS does not work with the Linux kernel natively. The developers of ZFS on Linux came up a rather crappy solution: By injecting the ZFS into the kernel via DKMS, Linux kernel will understand what is ZFS. It works very well, and it really works with a single assumption: The system will never get updated or rebooted after installing ZFS on Linux. So what will happen after you update the system (e.g., kernel, ZFS on Linux packages) and the system got rebooted? There is a good chance that your ZFS module will not be loaded:
|Event||What will happen after reboot?||What do you need to do?|
|You update kernel first, then ZFS on Linux afterward||
Before Dec 12, 2018: Your system will load the ZFS modules.
Dec 12, 2018 – Dec 2019: Probably not
After Jan 2020: 50/50
|Remove the old kernels from DKMS database. Rebuild the ZFS (and SPL if running 0.7.x) modules with the new kernel in the DKMS database.|
|You update ZFS on Linux first, then kernel afterward||If your system boot into the new kernel (which is default), your system WILL NOT load the ZFS modules.||Remove and install the ZFS and DKMS packages. Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.|
|You update ZFS on Linux only. Kernel has not been updated.||Your system will load the ZFS modules.||Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.|
|You update kernel only. ZFS on Linux has not been updated.||Your system will load the ZFS modules.||Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.|
There are two steps to rescue your data back. We will start with removing your DKMS module first. If it does not work, we will reinstall the ZFS packages. Also, I am assuming that your system is booted to the new kernel. Please keep in mind that ZFS on Linux does not work with Linux kernel v4 (as of Oct 3, 2019, either via kernel-ml or CentOS 8). It only works with v3.
If you need to access your data, the easiest way is to boot to the old working kernel. Once you are ready to clean up the problem, boot to the new kernel and follow my instructions below.
Step 1: Clean up and Reinstall DKMS Modules
Most of the time, the ZFS on Linux messes up the DKMS modules after the update. I suggest to clean up and reinstall DKMS modules once again. As of December 12, 2018, the ZFS on Linux will remove all of the DKMS modules for no reason.
First, check your DKMS status. You will need to clean up the DKMS if it is empty (nothing is installed), orphan (library is installed, but none of them is attached to any kernel) or multiple (multiple kernels installed). If it is clean (single kernel only), you may skip this step. If you are using ZFS on Linux ver 0.7.x, your DKMS will contain two modules (zfs and spl). If you are using ver. 0.8.x, your DKMS will contain one module only (zfs).
In general, all you want is only one version of DKMS modoule is installed, and it is attached to one kernel only. If you see multiple versions of DKMS modules, or multiple kernels, that’s bad.
#An example of dirty DKMS status (This is bad): spl, 0.7.12, 3.10.0-862.14.4.el7: installed (original_module exists) (WARNING! Diff between built and installed module!) spl, 0.7.12, 3.10.0-957.1.3.el7: installed (original_module exists) zfs, 0.7.12, 3.10.0-862.14.4.el7: installed (original_module exists) (WARNING! Diff between built and installed module!) zfs, 0.7.12, 3.10.0-957.1.3.el7: installed (original_module exists) #An example of empty DKMS status (This is bad): (empty) #An example of DKMS status without kernal (This is bad): zfs, 0.7.12: added spl, 0.7.12: added #An example of clean DKMS status (This is good): spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed or spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed (original_module exists) zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed (original_module exists) or zfs, 0.8.3, 3.10.0-1127.el7.x86_64, x86_64: installed (original_module exists)
In my example above, my ZFS on Linux is 0.7.12, my old kernel is 3.10.0-862.14.4.el7, my new kernel is 3.10.0-957.1.3.el7. Your version may be different.
If your situation is something like the following:
Error! Could not locate dkms.conf file. File: /var/lib/dkms/zfs/0.8.2/source/dkms.conf does not exist.
That means you have multiple versions of dkms-ZFS modules installed in your system. In my case, the 0.8.3 is running, and the old (0.8.2) is still available. Check the folder (/var/lib/dkms/zfs/) to see if any old libraries need to be removed.
#Currently running: dkms ZFS 0.8.3, kernel 3.10.0-1062.18.1.el7.x86_64 cd /var/lib/dkms/zfs/ #ls -al total 12K 0.8.2 <---- Delete this 0.8.3 kernel-3.10.0-1062.1.2.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.1.2.el7.x86_64/x86_64 <---- Delete this kernel-3.10.0-1062.4.1.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.4.1.el7.x86_64/x86_64 <---- Delete this kernel-3.10.0-1062.4.3.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.4.3.el7.x86_64/x86_64 <---- Delete this kernel-3.10.0-1062.7.1.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.7.1.el7.x86_64/x86_64 <---- Delete this kernel-3.10.0-1062.9.1.el7.x86_64-x86_64 -> 0.8.3/3.10.0-1062.9.1.el7.x86_64/x86_64 <---- Delete this
You may want to remove both ZFS and SPL DKMS modules first, then reinstall them:
#If your version is 0.7.x: sudo dkms remove zfs/0.7.12 --all; sudo dkms remove spl/0.7.12 --all; #If your version is 0.8.x: sudo dkms remove zfs/0.8.3 --all;
Sometimes, you will need to remove the old kernel manually:
sudo dkms remove zfs/0.7.12 -k 3.10.0-862.14.4.el7.x86_64; sudo dkms remove spl/0.7.12 -k 3.10.0-862.14.4.el7.x86_64;
Time to reinstall them:
#Don't forget to use the version that matches your system. In my situation, it was 0.7.12 / 0.8.3 #0.7.x: sudo dkms --force install spl/0.7.12; sudo dkms --force install zfs/0.7.12; #0.8.x: sudo dkms --force install zfs/0.8.3;
Run the DKMS status again. You should see both ZPL and SPL are attached to the new kernel:
#If your version is 0.7.x: spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed #If your version is 0.8.x: zfs, 0.8.3, 3.10.0-1127.el7.x86_64, x86_64: installed
Try to load the ZFS module and import your ZFS data:
sudo /sbin/modprobe zfs sudo zpool import -a
If everything looks good, you can reboot your system and test to see if the ZFS is loaded automatically or not. Once everything is okay, remove the old kernel from the system.
sudo package-cleanup --oldkernels --count=1 -y
That's it, you are good to go.
Step 2: Reinstall ZFS packages
If you have tried the first step and it didn't work. You may want to reinstall the ZFS packages. Here is a typical error message:
You try to import the ZFS data and the system complains:
#zpool import -a The ZFS modules are not loaded. Try running '/sbin/modprobe zfs' as root to load them.
So you try to load the ZFS module and the system complains again:
#/sbin/modprobe zfs modprobe: FATAL: Module zfs not found. or modprobe: ERROR: could not insert 'zfs': Invalid argument
What you need to do is to erase all the ZFS and related packages:
yum erase zfs zfs-dkms libzfs2 spl spl-dkms libzpool2 -y
Please reboot the system. This step is very important.
After that, try to install ZFS again.
yum install zfs -y
If the system complaints about mismatched dependent packages, try to remove the affected packages first and run the installation again.
After the installation, try to start the ZFS module:
/sbin/modprobe zfs zpool import -a
If the ZFS is up and running, please clean up your DKMS from step 1. If it complains again, please follow the steps below:
- Clear the cache of the yum repository and try to update the system again. (sudo yum clean all)
- Reboot to the latest kernel
- Erase the ZFS and related packages, try it again.
Keep in mind that ZFS on Linux is based on DKMS, a very buggy and unreliable platform. Sometimes when you uninstall and install the packages, don't expect that it will do the same thing as fresh install. Before you send your server to the landfill, try this:
Check the dkms status:
#version 0.7.x zfs, 0.7.2: added spl, 0.7.2: added #version 0.8.x zfs, 0.8.3: added
If you see this message, that means the ZFS packages have been installed, but the DKMS doesn't know how to use it. You will need to tell DKMS about it:
#version 0.7.x dkms --force install zfs/0.7.2 dkms --force install spl/0.7.2 #version 0.8.x dkms --force install zfs/0.8.3
#Try to start ZFS again. /sbin/modprobe zfs zpool import -a
If you already tried it for more than 3 times without any luck, don't waste your time. You may want to bring the ZFS disks to a different server. The new server should be able to recognize the ZFS disks. For the original server, you can connect to the ZFS disks on the new server via NFS using the original path. That will minimize the impact of changes.
Keep in mind that the ZFS version is very important. The server with newer ZFS version can read the ZFS disks created in older ZFS versions. You can always check the ZFS versions by running the following:
#Get the version of the host: sudo zfs upgrade -v sudo zpool upgrade -v #Get the version of the ZFS disks: sudo zfs get version sudo zpool get version
This is pretty much what I need to do on my 60 servers every month. If you are in a similar situation like mine, I guarantee that you will become an expert of fixing this kind of mess after few months. Good luck!