Last Updated: Dec 12, 2018
This article is based on my experience with CentOS 7. If you are running other Linux distributions, please adjust the commands and package names accordingly (e.g., yum –> apt-get).
ZFS on Linux is a poorly designed software solution to get ZFS up and running in Linux environments. Unlike FreeBSD, ZFS does not work with the Linux kernel natively. The developers of ZFS on Linux come up a crappy hack. By injecting the ZFS into the kernel via DKMS, Linux kernel will understand what is ZFS. It works very well, and it really works with a single assumption: The system will never be updated or rebooted after installing ZFS on Linux. So what will happen after you update the system (e.g., kernel, ZFS on Linux packages) and the system got rebooted? There is a good chance that your ZFS module will not be loaded. Otherwise you are not here, right?
|Event||What will happen after reboot?||What do you need to do?|
|You update kernel first, then ZFS on Linux afterward||
If you update before Dec 12, 2018: Your system will load the ZFS modules.
If you update after Dec 12, 2018: Probably not
|Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.|
|You update ZFS on Linux first, then kernel afterward||If your system boot into the new kernel (which is default), your system WILL NOT load the ZFS modules.||Remove and install the ZFS and DKMS packages. Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.|
|You update ZFS on Linux only. Kernel has not been updated.||Your system will load the ZFS modules.||Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.|
|You update kernel only. ZFS on Linux has not been updated||Your system will load the ZFS modules.||Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.|
There are two steps to rescue your data back. We will start with removing your DKMS module first. If it does not work, we will reinstall the ZFS packages. Also, I am assuming that your system is booted to the new kernel. Please keep in mind that ZFS on Linux does not work with Linux kernel v4. It only works with v3.
If you need to access your data, the easiest way is to boot to the old working kernel. Once you are ready to clean up the mess created by the ZFS on Linux team, boot to the new kernel and follow my instructions below.
Step 1: Clean up and Reinstall DKMS Modules
Most of the time, the ZFS on Linux messes up the DKMS modules after the update. I suggest to clean up and reinstall DKMS modules once again. As of December 12, 2018, the ZFS on Linux will remove all of the DKMS modules for no reason.
First, check your DKMS status. You will need to clean up the DKMS if it is empty or dirty (multiple kernels installed). If it is clean (single kernel only), you can skip this step.
#An example of dirty DKMS status (This is bad): spl, 0.7.12, 3.10.0-862.14.4.el7: installed (original_module exists) (WARNING! Diff between built and installed module!) spl, 0.7.12, 3.10.0-957.1.3.el7: installed (original_module exists) zfs, 0.7.12, 3.10.0-862.14.4.el7: installed (original_module exists) (WARNING! Diff between built and installed module!) zfs, 0.7.12, 3.10.0-957.1.3.el7: installed (original_module exists) #An example of empty DKMS status (This is bad): (empty) #An example of clean DKMS status (This is good): spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed or spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed (original_module exists)
In my example above, my ZFS on Linux is 0.7.12, my old kernel is 3.10.0-862.14.4.el7, my new kernel is 3.10.0-957.1.3.el7. Your version may be different.
You may want to remove both ZFS and SPL DKMS modules first, then reinstall them:
sudo dkms remove zfs/0.7.12 --all; sudo dkms remove spl/0.7.12 --all;
Sometimes, you will need to remove the old kernel manually:
sudo dkms remove zfs/0.7.12 -k 3.10.0-862.14.4.el7.x86_64; sudo dkms remove spl/0.7.12 -k 3.10.0-862.14.4.el7.x86_64;
Time to reinstall them:
sudo dkms --force install spl/0.7.12; sudo dkms --force install zfs/0.7.12;
Run the DKMS status again. You should see both ZPL and SPL are attached to the new kernel:
spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed (original_module exists)
Try to load the ZFS module and import your ZFS data:
sudo /sbin/modprobe zfs sudo zpool import -a
If everything looks good, you can reboot your system and test to see if the ZFS is loaded automatically or not. Once everything is okay, remove the old kernel from the system.
sudo package-cleanup --oldkernels --count=1 -y
That’s it, you are good to go.
Step 2: Reinstall ZFS packages
If you have tried the first step and it didn’t work. You may want to reinstall the ZFS packages. Here is a typical error message:
You try to import the ZFS data and the system complains:
#zpool import -a The ZFS modules are not loaded. Try running '/sbin/modprobe zfs' as root to load them.
So you try to load the ZFS module and the system complains again:
#/sbin/modprobe zfs modprobe: FATAL: Module zfs not found. or modprobe: ERROR: could not insert 'zfs': Invalid argument
What you need to do is to erase all the ZFS and related packages:
yum erase zfs zfs-dkms libzfs2 spl spl-dkms libzpool2 -y
Please reboot the system. This step is very important.
After that, try to install ZFS again.
yum install zfs -y
If the system complaints about mismatch dependent packages, try to remove the affected packages first and run the installation again.
After the installation, try to start the ZFS module:
/sbin/modprobe zfs zpool import -a
If the ZFS is up and running, please clean up your DKMS from step 1. If it complains again, please follow the steps below:
- Clear the cache of the yum repository and try to update the system again. (sudo yum clean all)
- Reboot to the latest kernel
- Erase the ZFS and related packages, try it again.
If you already tried it for more than 3 times without any luck, don’t waste your time. You may want to bring the ZFS disks to a different server. The new server should be able to recognize the ZFS disks. For the original server, you can connect to the ZFS disks on the new server via NFS using the original path. That will minimize the impact of changes.
Keep in mind that the ZFS version is very important. The server with newer ZFS version can read the ZFS disks created in older ZFS versions. You can always check the ZFS versions by running the following:
#Get the version of the host: sudo zfs upgrade -v sudo zpool upgrade -v #Get the version of the ZFS disks: sudo zfs get version sudo zpool get version
This is pretty much what I need to do on my 60 servers every month. If you are in a similar situation like mine, I guarantee that you will become an expert of fixing this kind of mess after few months. Good luck!