Let’s all agree with this fact: ZFS is foreign to Linux. It is not native. You can’t expect that ZFS on Linux will run smoothly as FreeBSD or Solaris. Having using ZFS on Linux since 2013 (and ZFS on FreeBSD since 2009), I’ve noticed that ZFS does not like Linux (well, at least RHEL 7). Here are some few examples:
- ZFS is not loaded at the boot time. You will need to manually start it or load it via cron. Good luck if you have other services (like Apache, MySQL, NFS, or even users’ home directories) that depend on the ZFS.
- Every single time you update the kernel, ZFS will not work after the reboot without some manual work. What if the system runs the update automatically, and one day there is a power failure which makes your server to reboot to a new kernel? Your system will not be able to mount your ZFS volume. If you integrate ZFS with other service applications such as web, database or network drive, oh well, good luck and I hope you will catch this problem fast enough before receiving thousands of emails and calls from your end-users.
- If you exclude the kernel from the updates (/etc/yum.conf), you will eventually run into trouble, because there are tons of other packages that require the latest kernel. In the other words, running the command: yum update -y will fail. You will need to run yum update –skip-broken, which means you will miss many latest packages. Here is an example:
--> Finished Dependency Resolution Error: Package: hypervvssd-0-0.29.20160216git.el7.x86_64 (base) Requires: kernel >= 3.10.0-384.el7 Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda) kernel = 3.10.0-327.el7 Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates) kernel = 3.10.0-327.22.2.el7 Error: Package: hypervfcopyd-0-0.29.20160216git.el7.x86_64 (base) Requires: kernel >= 3.10.0-384.el7 Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda) kernel = 3.10.0-327.el7 Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates) kernel = 3.10.0-327.22.2.el7 Error: Package: hypervkvpd-0-0.29.20160216git.el7.x86_64 (base) Requires: kernel >= 3.10.0-384.el7 Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda) kernel = 3.10.0-327.el7 Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates) kernel = 3.10.0-327.22.2.el7 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest
- If you are running the stable Linux distributions like RHEL 7, you can load a more recent kernel like 4.x by installing the package: kernel-ml. However, don’t expect that ZFS will work with version 4:
Loading new spl-0.6.5.9 DKMS files... Building for 4.11.2-1.el7.elrepo.x86_64 Building initial module for 4.11.2-1.el7.elrepo.x86_64 configure: error: unknown Error! Bad return status for module build on kernel: 4.11.2-1.el7.elrepo.x86_64 (x86_64) Consult /var/lib/dkms/spl/0.6.5.9/build/make.log for more information.
Running ZFS on Linux is like putting a giraffe in the wild in Alaska. It is just not the right thing to do. Unfortunately, there are so many things that only available on Linux so we have to live with it. Just like FUSE (Filesystem in Userspace), many people feel hesitated to run their file systems on the userspace instead of kernel level, but hey, see how many people are happy with GlusterFS, a distributed file system that live on FUSE! Personally I just think it is not a right thing to do, especially in an enterprise environment. Running a production file system at the userspace level, seriously?
Anyway, if you are running into trouble after upgrading your Linux kernel (and you almost had a heart attack when you think your data may be lost), you have two choices:
- Simply boot to the previous working kernel if you need to get your data back in quick. However, keep in mind that this will create two problems:
- Since you already update the system with the new kernel and the new packages, your new packages probably will not work with the old kernel, and that may give you extra headache.
- Unless you manually overwrite the kernel boot order (boot loader config), otherwise you may get into the same trouble in the next boot.
- If you want a more “permanent” fix, you will need to rebuild the dkms ZFS and SPL modules. See below for the instructions. Keep in mind that you will have the same problem again when the kernel receives a new update.
You’ve tried to load the ZFS and realize that it is no longer available:
#sudo zpool import The ZFS modules are not loaded. Try running '/sbin/modprobe zfs' as root to load them. #sudo /sbin/modprobe zfs modprobe: FATAL: Module zfs not found.
You may want to check the dkms status. Write down the version number. In my case, it is 0.6.5.9
#sudo dkms status spl, 0.6.5.9, 3.10.0-327.28.3.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) spl, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) zfs, 0.6.5.9, 3.10.0-327.28.3.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
Before running the following commands, make sure that you know what you are doing.
#Make sure that you reboot to the kernel you want to fix. #Find out what is the current kernel uname -a Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux #In my example, it is: 3.10.0-514.2.2.el7.x86_64 #Now, let's get into the fun part. We will remove them and reinstall them. #Don't forget to match your version, in my base, my version is: 0.6.5.9 sudo dkms remove zfs/0.6.5.9 --all sudo dkms remove spl/0.6.5.9 --all sudo dkms --force install spl/0.6.5.9 sudo dkms --force install zfs/0.6.5.9 #or you can run these commands in one line, so that you don't need to wait: sudo dkms remove zfs/0.6.5.9 --all; sudo dkms remove spl/0.6.5.9 --all; sudo dkms --force install spl/0.6.5.9; sudo dkms --force install zfs/0.6.5.9;
And we will verify the result.
#sudo dkms status spl, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed zfs, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed
Finally we can start the ZFS again.
sudo /sbin/modprobe zfs
Your ZFS pool should back. You can verify it by rebooting your machine. Notice that Linux may not automatically mount the ZFS volumes. You may want to mount it manually or via cron job.
Here is how to mount the ZFS volumes manually.
sudo zpool import -a
You may want to remove all of the old kernels too.
sudo package-cleanup --oldkernels --count=1 -y
Pingback: zfs: disagrees about version – SvennD
I don’t see similar issue’s on Debian based distro’s (proxmox, ubuntu,…) So I think its Centos/RHEL that is not making friends with ZFS … just yet ! Just one remark, you remove all extra modules, if any other modules are in /extra wouldn’t that be destroy functionality on next reboot ?
I haven’t tried Debian based yet. So far I only found the issues with RHEL 7. For RHEL 6, so far things are okay. I guess it has something to do with the kernel (RHEL 7: ver 3; RHEL 6: ver 6). For all new Ubuntu, it is using ver 4.
In my case, I only have ZFS installed. I don’t have any other things. That’s why I put a warning before the command: “Make sure you know what you are doing.”.
Not sure what’s going on, but this just happened to me yesterday. I’m running Debian (testing) with the root fs on ZFS. After an update (apt-get dist-upgrade), it updated the kernel among other things, and then, after a reboot, the box wouldn’t boot because zfs couldn’t be loaded anymore. Needless to say, I was cursing. (Had to boot the machine from an emergency system I have on a usb stick, rollback to the last snapshot and now it’s working again.)
I’ve been using ZFS on linux for about 3 years now on several machines. Love it overall. Then, about 3 weeks ago, I started the endeavour of running root on zfs. After yesterday’s fiasco I have serious doubts whether zfs on linux is where I thought it is.
Then again, why would you run zfs on root ? /home seems logic, but for /root its not really an advantige is it ?
No I do not have ZFS on root for my Linux box. Because of the whole kernel mess, it is never safe to run ZFS on root like FreeBSD.
In your situation, I will simply boot to the previous working kernel and perform the rebuilding from there.
good luck to your files? not true.. ive had this happen numerous times. I have blow out power supply’s during writes, watches zfs liner span arrays that had no parity disks at all keep themselves even through this. Resilvering and crc checks happen automatically, and adding two parity disks (raidz2) makes it the safest around. Just reimport that pool, the data will always sit there. You can rip out drives, reinstall another operating system, as long as you create the zfs array the same way you did , i.e. using disk-by-id and what not, you can then just reimport the pool, the dataset will still be there.
Then on the kernel issue, its always as easy as reverting kernel, or removing zfs altogether for a moment, then finding another version to get it working as a temp stop-gap. I use the zfs-kmod so that i dont really have to worry about this.
As far as on Arch Linux distro, its up to the zfs maintainer, your right, and they (mostly demizer afaik) do a pretty good job of keeping it updated. The main thing you need to do there is pretty simple too, NEVER take the kernel update if you see that package error for zfs, wait until it goes away, which means somebody on the zfs end finally caught up their package (or use the kmod version which doesn’t have this problem).
So you needn’t worry about zfs losing your files, however i fully agree with you on attempting it for a boot drive, i have not attempted it yet for that reason, because i need it to be enterprise reliable, so i have yet to try that on a zfs drive for a linux.
Thank you for your comment. The reliability of ZFS is rock solid and there is no doubt (I have been a ZFS user since 2006). I was talking about the interaction between the OS and the ZFS. For example, if a power failure that causes the server reboot to a newer kernel, the ZFS volume won’t be loaded without some extra manual work. During that period (i.e., in between the server was rebooted and the sys admin notice the problem), all of the services will not work because the files are sitting somewhere in the ZFS volumes. Did you get the point?
Your suggestions on the kernel issues is good, however what if you are managing 100 of servers and each of them uses ZFS? That’s not an easy task. Also there is always a need to update kernel (e.g., for security reasons). In general, rebooting the server will take less than a minute. However, it will take another 10-15 minutes to rebuild the ZFS dkms libraries. That’s really a pain in a production environment.
ZFS on Linux is not the same as ZFS. It is more than just porting the code to Linux platform. The maintainer has made some changes to the default settings. If you spend some time to try the ZFS on FreeBSD and Solaris, you will notice that some default settings (e.g., arc size) are different. As an end user, I don’t care how the kernel or internal stuffs work. I just want to set up a system and get it running. ZFS works great in FreeBSD and Solaris, and they require almost no maintenance at all. Sorry but ZFS is really not for general Linux users. (Please blame the legal licensing issues of Linux.)
Thanks Derrick, very informative. 🙂