CentOS 7 – dracut-initqueue timeout

I received a Christmas gift from the RHEL 7 / CentOS 7 / Linux kernel team today. After my system got updated to the new kernel (3.10.0-957.1.3.el7.x86_64), my system gave me few surprises. As a sys-admin, I don’t want to see any surprise. What I really want is a working system. That’s one of the reasons why I always suggest people to use FreeBSD if possible. FreeBSD is a truly rock solid system.

Long story short. You have a working system. Your system receive a new kernel (e.g., 3.10.0-957.1.3.el7.x86_64). You decide to boot to that new kernel. Your system takes forever to boot. You go down to the server room, turn on the monitor and see the following messages:

[time stamp]dracut-initqueue [289]: Warning: dracut-initqueue timeout - starting tmeout scripts
[time stamp]dracut-initqueue [289]: Warning: dracut-initqueue timeout - starting tmeout scripts
...
[time stamp]dracut-initqueue [289]: Warning: count not boot.
[time stamp]dracut-initqueue [289]: Warning: /dev/disk/by-uuid/XXX does not exists
   Starting Dracut Emergency Shell...
Warning: /dev/disk/by-uuid/XXX does not exists

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.

dracut:/#

In my situation, my system has no problem to boot into the older kernel. It just does not like the new kernel. In my case, I check my /etc/fstab settings. I disable all of the non-standard devices.

#The following are standard.
UUID=12347890-1234-9512-9518-963852710258       /                       xfs     defaults        0 0
UUID=12347890-1234-4513-7532-963852710258       /boot                   xfs     defaults        0 0
UUID=12347890-1234-9587-8526-963852710258       swap                    swap    defaults        0 0


#The new kernel does not like it. I have to comment it out.
#/storage/data/Dropbox.img                      /Dropbox/               ext4    defaults        0 0

That’s it. I simply mount the image after the system is booted and the problem is solved. This is done by mounting the image in a script and run it in /etc/crontab (@reboot)

The second surprise was even worse. In one of my Linux machines, the OS was installed in the USB drive (because all SATA ports have been used for raid storage). For some odd reasons, the new kernel cannot be booted because the system lives in a USB flash drive. So I tried to install a new CentOS 7 on the same hardware (the installation disk contains the latest kernel) and it is giving me the same results. I ended up installing the OS on a SATA hard drive, which is not what I want because my computer case does not have any extra space of another hard drive.

Sometimes, I think the Linux kernel / system engineers have way too much spare time.

Our sponsors:

“ZFS on Linux”: The ZFS modules are not loaded. Try running ‘/sbin/modprobe zfs’ as root to load them.

Last Updated: May 10, 2020

This article is based on my experience with CentOS 7. If you are running other Linux distributions, please adjust the commands and package names accordingly (e.g., yum –> apt-get).

As of Oct 3, 2019, I cannot get ZFS on Linux running on CentOS 8.

ZFS on Linux is a not robust solution to get ZFS up and running in Linux environments. Unlike FreeBSD, ZFS does not work with the Linux kernel natively. The developers of ZFS on Linux came up a rather crappy solution: By injecting the ZFS into the kernel via DKMS, Linux kernel will understand what is ZFS. It works very well, and it really works with a single assumption: The system will never get updated or rebooted after installing ZFS on Linux. So what will happen after you update the system (e.g., kernel, ZFS on Linux packages) and the system got rebooted? There is a good chance that your ZFS module will not be loaded:

Event What will happen after reboot? What do you need to do?
You update kernel first, then ZFS on Linux afterward
Before Dec 12, 2018: Your system will load the ZFS modules.
Dec 12, 2018 – Dec 2019: Probably not
After Jan 2020: 50/50
Remove the old kernels from DKMS database. Rebuild the ZFS (and SPL if running 0.7.x) modules with the new kernel in the DKMS database.
You update ZFS on Linux first, then kernel afterward If your system boot into the new kernel (which is default), your system WILL NOT load the ZFS modules. Remove and install the ZFS and DKMS packages. Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.
You update ZFS on Linux only. Kernel has not been updated. Your system will load the ZFS modules. Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.
You update kernel only. ZFS on Linux has not been updated. Your system will load the ZFS modules. Remove the old kernels from DKMS database. Rebuild the SPL and ZFS modules with the new kernel in the DKMS database.

There are two steps to rescue your data back. We will start with removing your DKMS module first. If it does not work, we will reinstall the ZFS packages. Also, I am assuming that your system is booted to the new kernel. Please keep in mind that ZFS on Linux does not work with Linux kernel v4 (as of Oct 3, 2019, either via kernel-ml or CentOS 8). It only works with v3.

If you need to access your data, the easiest way is to boot to the old working kernel. Once you are ready to clean up the problem, boot to the new kernel and follow my instructions below.

Step 1: Clean up and Reinstall DKMS Modules

Most of the time, the ZFS on Linux messes up the DKMS modules after the update. I suggest to clean up and reinstall DKMS modules once again. As of December 12, 2018, the ZFS on Linux will remove all of the DKMS modules for no reason.

First, check your DKMS status. You will need to clean up the DKMS if it is empty (nothing is installed), orphan (library is installed, but none of them is attached to any kernel) or multiple (multiple kernels installed). If it is clean (single kernel only), you may skip this step. If you are using ZFS on Linux ver 0.7.x, your DKMS will contain two modules (zfs and spl). If you are using ver. 0.8.x, your DKMS will contain one module only (zfs).

#dkms status

In general, all you want is only one version of DKMS modoule is installed, and it is attached to one kernel only. If you see multiple versions of DKMS modules, or multiple kernels, that’s bad.

#An example of dirty DKMS status (This is bad):
spl, 0.7.12, 3.10.0-862.14.4.el7: installed (original_module exists) (WARNING! Diff between built and installed module!)
spl, 0.7.12, 3.10.0-957.1.3.el7: installed (original_module exists)
zfs, 0.7.12, 3.10.0-862.14.4.el7: installed (original_module exists) (WARNING! Diff between built and installed module!)
zfs, 0.7.12, 3.10.0-957.1.3.el7: installed (original_module exists)

#An example of empty DKMS status (This is bad):
(empty)

#An example of DKMS status without kernal (This is bad):
zfs, 0.7.12: added
spl, 0.7.12: added

#An example of clean DKMS status (This is good):
spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed
zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed 

or 

spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed (original_module exists)
zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed (original_module exists)

or 

zfs, 0.8.3, 3.10.0-1127.el7.x86_64, x86_64: installed (original_module exists)

In my example above, my ZFS on Linux is 0.7.12, my old kernel is 3.10.0-862.14.4.el7, my new kernel is 3.10.0-957.1.3.el7. Your version may be different.

If your situation is something like the following:

Error! Could not locate dkms.conf file.
File: /var/lib/dkms/zfs/0.8.2/source/dkms.conf does not exist.

That means you have multiple versions of dkms-ZFS modules installed in your system. In my case, the 0.8.3 is running, and the old (0.8.2) is still available. Check the folder (/var/lib/dkms/zfs/) to see if any old libraries need to be removed.

#Currently running: dkms ZFS 0.8.3, kernel 3.10.0-1062.18.1.el7.x86_64

cd /var/lib/dkms/zfs/

#ls -al
total 12K
0.8.2 <---- Delete this
0.8.3
kernel-3.10.0-1062.1.2.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.1.2.el7.x86_64/x86_64 <---- Delete this
kernel-3.10.0-1062.4.1.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.4.1.el7.x86_64/x86_64 <---- Delete this
kernel-3.10.0-1062.4.3.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.4.3.el7.x86_64/x86_64 <---- Delete this
kernel-3.10.0-1062.7.1.el7.x86_64-x86_64 -> 0.8.2/3.10.0-1062.7.1.el7.x86_64/x86_64 <---- Delete this
kernel-3.10.0-1062.9.1.el7.x86_64-x86_64 -> 0.8.3/3.10.0-1062.9.1.el7.x86_64/x86_64 <---- Delete this

You may want to remove both ZFS and SPL DKMS modules first, then reinstall them:

#If your version is 0.7.x:
sudo dkms remove zfs/0.7.12 --all; 
sudo dkms remove spl/0.7.12 --all; 


#If your version is 0.8.x:
sudo dkms remove zfs/0.8.3 --all; 

Sometimes, you will need to remove the old kernel manually:

sudo dkms remove zfs/0.7.12 -k 3.10.0-862.14.4.el7.x86_64; 
sudo dkms remove spl/0.7.12 -k 3.10.0-862.14.4.el7.x86_64;

Time to reinstall them:

#Don't forget to use the version that matches your system. In my situation, it was 0.7.12 / 0.8.3

#0.7.x:
sudo dkms --force install spl/0.7.12; 
sudo dkms --force install zfs/0.7.12;

#0.8.x:
sudo dkms --force install zfs/0.8.3;

Run the DKMS status again. You should see both ZPL and SPL are attached to the new kernel:

#If your version is 0.7.x:
spl, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed
zfs, 0.7.12, 3.10.0-957.1.3.el7.x86_64, x86_64: installed

#If your version is 0.8.x:
zfs, 0.8.3, 3.10.0-1127.el7.x86_64, x86_64: installed

Try to load the ZFS module and import your ZFS data:

sudo /sbin/modprobe zfs
sudo zpool import -a

If everything looks good, you can reboot your system and test to see if the ZFS is loaded automatically or not. Once everything is okay, remove the old kernel from the system.

sudo package-cleanup --oldkernels --count=1 -y

That's it, you are good to go.


Step 2: Reinstall ZFS packages

If you have tried the first step and it didn't work. You may want to reinstall the ZFS packages. Here is a typical error message:

You try to import the ZFS data and the system complains:

#zpool import -a
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.

So you try to load the ZFS module and the system complains again:

#/sbin/modprobe zfs
modprobe: FATAL: Module zfs not found.
or
modprobe: ERROR: could not insert 'zfs': Invalid argument

What you need to do is to erase all the ZFS and related packages:

yum erase zfs zfs-dkms libzfs2 spl spl-dkms libzpool2 -y

Please reboot the system. This step is very important.

reboot

After that, try to install ZFS again.

yum install zfs -y

If the system complaints about mismatched dependent packages, try to remove the affected packages first and run the installation again.

After the installation, try to start the ZFS module:

/sbin/modprobe zfs
zpool import -a

If the ZFS is up and running, please clean up your DKMS from step 1. If it complains again, please follow the steps below:

  1. Reboot
  2. Clear the cache of the yum repository and try to update the system again. (sudo yum clean all)
  3. Reboot to the latest kernel
  4. Erase the ZFS and related packages, try it again.

Keep in mind that ZFS on Linux is based on DKMS, a very buggy and unreliable platform. Sometimes when you uninstall and install the packages, don't expect that it will do the same thing as fresh install. Before you send your server to the landfill, try this:

Check the dkms status:

#dkms status
#version 0.7.x
zfs, 0.7.2: added
spl, 0.7.2: added

#version 0.8.x
zfs, 0.8.3: added

If you see this message, that means the ZFS packages have been installed, but the DKMS doesn't know how to use it. You will need to tell DKMS about it:

#version 0.7.x
dkms --force install zfs/0.7.2
dkms --force install spl/0.7.2

#version 0.8.x
dkms --force install zfs/0.8.3
#Try to start ZFS again.
/sbin/modprobe zfs
zpool import -a

If you already tried it for more than 3 times without any luck, don't waste your time. You may want to bring the ZFS disks to a different server. The new server should be able to recognize the ZFS disks. For the original server, you can connect to the ZFS disks on the new server via NFS using the original path. That will minimize the impact of changes.

Keep in mind that the ZFS version is very important. The server with newer ZFS version can read the ZFS disks created in older ZFS versions. You can always check the ZFS versions by running the following:

#Get the version of the host:
sudo zfs upgrade -v
sudo zpool upgrade -v


#Get the version of the ZFS disks:
sudo zfs get version
sudo zpool get version

This is pretty much what I need to do on my 60 servers every month. If you are in a similar situation like mine, I guarantee that you will become an expert of fixing this kind of mess after few months. Good luck!

Our sponsors:

ZFS Cluster: A Network-Based ZFS Implementation

I always like to experimenting the idea of building a ZFS cluster, i.e., it has the robust of the ZFS with the cluster capacity. So I came up a test environment with this prototype.

The idea is pretty simple. Typically when we build the ZFS server, the members of the RAID are the hard drives. In my experiment, I use files instead of hard drives, where the corresponding files live in a network share (mounted via NFS). Since the bottle neck of the I/O will be limited by the network, I include a network bonding to increase the overall bandwidth.

The yellow servers are simply regular servers running ZFS with NFS service. I use the following command to generate a simple file / place holder for ZFS mounting:

#This will create an empty 1TB file, you can think of it as a 1TB hard drive / place holder.
truncate -s 1000G file.img

Make sure that the corresponding NFS service is serving the file.img to the client (the blue server).


The blue server will be the NFS client of the yellow servers, where I will use it to serve the data to other computers. It has the following features:

It has a network bonding based on three Ethernet adapter:

#cat /proc/net/bonding/bond0


Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: enp0s25
MII Status: up
MII Polling Interval (ms): 1
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp0s25
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:19:d1:b2:1e:0d
Slave queue ID: 0

Slave Interface: enp6s0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:18:4d:f0:12:7b
Slave queue ID: 0

Slave Interface: enp6s1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:22:3f:f6:98:03
Slave queue ID: 0

It mounts the yellow servers via NFS

#df
192.168.1.101:/storage/share   25T  3.9T   21T  16%  /nfs/192-168-1-101
192.168.1.102:/storage/share   8.1T  205G  7.9T   3% /nfs/192-168-1-102
192.168.1.103:/storage/share   8.3T  4.4T  4.0T  52% /nfs/192-168-1-103

The ZFS has the following structure:

#sudo zpool status
        NAME                                  STATE     READ WRITE CKSUM
        storage                               ONLINE       0     0     0
          raidz1-0                            ONLINE       0     0     0
            /nfs/192-168-1-101/file.img       ONLINE       0     0     0
            /nfs/192-168-1-102/file.img       ONLINE       0     0     0
            /nfs/192-168-1-103/file.img       ONLINE       0     0     0

or:

zpool create -f storage raidz /nfs/192-168-1-101/file.img \
                              /nfs/192-168-1-102/file.img \
                              /nfs/192-168-1-103/file.img

The speed of the network will be the limitation of the system, I don’t expect the I/O speed goes beyond 375MB/s (125MB/s x 3). Also since it is a file-based ZFS (the ZFS on the blue server is based on files, not disks), so the overall performance will be discounted.

#Write speed
time dd if=/dev/zero of=/storage/data/file.out bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.6717 s, 285 MB/s
#Read speed
time dd if=/storage/data/file.out of=/dev/null
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.9618 s, 265 MB/s

Both read and the write speed are roughly around 75% of the maximum bandwidth, which is not bad at all.

So I decide to make one of the yellow servers offline, let’s see what’s going on:

#sudo zpool status
        NAME                                  STATE     READ WRITE CKSUM
        storage                               ONLINE       0     0     0
          raidz1-0                            ONLINE       0     0     0
            /nfs/192-168-1-101/file.img       ONLINE       0     0     0
            /nfs/192-168-1-102/file.img       ONLINE       0     0     0
            /nfs/192-168-1-103/file.img       UNAVAIL      0     0     0 cannot open

And the pool is still functioning, that’s pretty cool!


Here are some notes that will affect the overall performance:

  • The quality of the Ethernet card matters, which includes PCIe or PCI, 1 lane or 16 lane, total throughput etc.
  • The network traffic. Is the switch busy?
  • How are you connect these servers together? One big switch or multiple switches that are bridged together. If they are bridged, the limitation will be the cable of the bridge, which is 125MB/s for gigabit network.

Again, this is just an for experimental purposes only. If you decide to put this in a production environment, do it on your own risk. Have fun!

Our sponsors:

How to install ZFS on RHEL / CentOS 7

A friend of mine likes to try ZFS on CentOS 7, therefore I decide to make a guide for him. The following instructions have been well tested on CentOS 7.

Before you decide to put ZFS in a production use, you should be aware of the following:

  • ZFS is originally designed to work with Solaris and BSD system. Because of the legal and licensing issues, ZFS cannot be shipped with Linux.
  • Since ZFS is open source, some developers port the ZFS the Linux, and make it run at the kernel level via dkms. This works great as long as you don’t update the kernel. Otherwise the ZFS will not be loaded with the new kernel.
  • In a ZFS/Linux environment, it is a bad idea to update the system automatically.
  • For some odd reasons, ZFS/Linux will work with server grade or gaming grade computers. Do not run ZFS/Linux on entry level computers.

Instructions

By default, ZFS is not available in the standard CentOS repository. We will need to include some 3rd party repositories here.

sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -Uvh https://forensics.cert.org/cert-forensics-tools-release-el7.rpm
sudo rpm -Uvh http://download.zfsonlinux.org/epel/zfs-release.el7_6.noarch.rpm

sudo yum update -y
sudo yum groupinstall -y "Development Tools" "Development Libraries" "Additional Development"
sudo yum install -y kernel-devel kernel-headers

It is very likely that the system will install a new kernel. You may want to reboot the computer before installing the ZFS.

sudo reboot

Please make sure that the system does not update automatically. If you need to update the system, please exclude the kernel and related modules from the update.

sudo nano /etc/yum.conf 
exclude=kernel*

Now you are on the latest kernel. Let’s install the ZFS:

sudo yum install -y zfs
sudo /sbin/modprobe zfs

Now, you can create a simple stripped ZFS. Stripped ZFS gives you the best performances and zero data protections. When referencing the disks, we don’t want to use /dev/sd*, instead, we want to use the device id directly, e.g., /dev/disk/by-id/wwn-0x8000c8004e8ac11a

ls /dev/disk/by-id/


lrwxrwxrwx 1 root root  9 Jan  3 21:49 wwn-0x8000c8004e8ac11a -> ../../sde
lrwxrwxrwx 1 root root  9 Jan  3 21:49 wwn-0x8000c8008ad0a22d -> ../../sdd
lrwxrwxrwx 1 root root  9 Jan  3 21:49 wwn-0x8000c8008b4f6338 -> ../../sda
lrwxrwxrwx 1 root root  9 Jan  3 21:49 wwn-0x8000c8008b52144c -> ../../sdc
lrwxrwxrwx 1 root root  9 Jan  3 21:49 wwn-0x8000c8008b59a553 -> ../../sdb

Once you identify the list of the hard disks, we can create a simple stripped ZFS. This will create a ZFS under /storage. You can replace storage to anything you like.

#We are going to create a ZFS pool with three disks. You can add more if you like. For stripped design, the higher number of disks, the faster the IO speed.
zpool create -f storage /dev/disk/by-id/device1 /dev/disk/by-id/device2 /dev/disk/by-id/device3 

storage is like a big umbrella. Under this umbrella, we will need to create multiple “partitions” for storing data.

zfs create storage/mydata

If you have a fast CPU like i7, you may want to turn on the compression. This will reduce the amount of data write to the system, and it will improve the overall performance.

sudo zfs set compression=lz4 storage

Finally we want to change the ownership and the permissions

#Assuming that you are part of the wheel group
sudo chown -R root:wheel /storage
sudo chmod -R g+rw /storage

Now, run df and you should be able to see the ZFS in your system.

#df -h
Filesystem                   Size  Used Avail Use% Mounted on
storage                      6.8T  128K  6.8T   1% /storage
storage/mydata                13T  6.1T  6.8T  48% /storage/mydata

You can monitor the health of the ZFS system.

#sudo zpool status



  pool: storage
 state: ONLINE
config:

        NAME                        STATE     READ WRITE CKSUM
        storage                     ONLINE       0     0     0
            wwn-0x8000c8004e8ac11a  ONLINE       0     0     0
            wwn-0x8000c8008ad0a22d  ONLINE       0     0     0
            wwn-0x8000c8008b4f6338  ONLINE       0     0     0
            wwn-0x8000c8008b52144c  ONLINE       0     0     0


errors: No known data errors

For some very odd reasons, ZFS will not be loaded automatically. We want to make sure that ZFS will be loaded after reboot.

#sudo nano /etc/crontab

#Add the following:
@reboot         root    sleep 10; zpool import -a;

Now you can try to test the ZFS by running dd or copying a big file to the ZFS. If you are not happy with the configurations, you can always destroy it and re-create the ZFS again.

sudo zpool destroy storage

Further Reading

Have fun!

Our sponsors:

CentOS/RHEL 7 – No ZFS After Updating the Kernel

Let’s all agree with this fact: ZFS is foreign to Linux. It is not native. You can’t expect that ZFS on Linux will run smoothly as FreeBSD or Solaris. Having using ZFS on Linux since 2013 (and ZFS on FreeBSD since 2009), I’ve noticed that ZFS does not like Linux (well, at least RHEL 7). Here are some few examples:

  • ZFS is not loaded at the boot time. You will need to manually start it or load it via cron. Good luck if you have other services (like Apache, MySQL, NFS, or even users’ home directories) that depend on the ZFS.
  • Every single time you update the kernel, ZFS will not work after the reboot without some manual work. What if the system runs the update automatically, and one day there is a power failure which makes your server to reboot to a new kernel? Your system will not be able to mount your ZFS volume. If you integrate ZFS with other service applications such as web, database or network drive, oh well, good luck and I hope you will catch this problem fast enough before receiving thousands of emails and calls from your end-users.
  • If you exclude the kernel from the updates (/etc/yum.conf), you will eventually run into trouble, because there are tons of other packages that require the latest kernel. In the other words, running the command: yum update -y will fail. You will need to run yum update –skip-broken, which means you will miss many latest packages. Here is an example:
    --> Finished Dependency Resolution
    Error: Package: hypervvssd-0-0.29.20160216git.el7.x86_64 (base)
               Requires: kernel >= 3.10.0-384.el7
               Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda)
                   kernel = 3.10.0-327.el7
               Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates)
                   kernel = 3.10.0-327.22.2.el7
    Error: Package: hypervfcopyd-0-0.29.20160216git.el7.x86_64 (base)
               Requires: kernel >= 3.10.0-384.el7
               Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda)
                   kernel = 3.10.0-327.el7
               Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates)
                   kernel = 3.10.0-327.22.2.el7
    Error: Package: hypervkvpd-0-0.29.20160216git.el7.x86_64 (base)
               Requires: kernel >= 3.10.0-384.el7
               Installed: kernel-3.10.0-327.el7.x86_64 (@anaconda)
                   kernel = 3.10.0-327.el7
               Installed: kernel-3.10.0-327.22.2.el7.x86_64 (@updates)
                   kernel = 3.10.0-327.22.2.el7
     You could try using --skip-broken to work around the problem
     You could try running: rpm -Va --nofiles --nodigest
    
  • If you are running the stable Linux distributions like RHEL 7, you can load a more recent kernel like 4.x by installing the package: kernel-ml. However, don’t expect that ZFS will work with version 4:
    Loading new spl-0.6.5.9 DKMS files...
    Building for 4.11.2-1.el7.elrepo.x86_64
    Building initial module for 4.11.2-1.el7.elrepo.x86_64
    configure: error: unknown
    Error! Bad return status for module build on kernel: 4.11.2-1.el7.elrepo.x86_64 (x86_64)
    Consult /var/lib/dkms/spl/0.6.5.9/build/make.log for more information.
    
    

Running ZFS on Linux is like putting a giraffe in the wild in Alaska. It is just not the right thing to do. Unfortunately, there are so many things that only available on Linux so we have to live with it. Just like FUSE (Filesystem in Userspace), many people feel hesitated to run their file systems on the userspace instead of kernel level, but hey, see how many people are happy with GlusterFS, a distributed file system that live on FUSE! Personally I just think it is not a right thing to do, especially in an enterprise environment. Running a production file system at the userspace level, seriously?

Anyway, if you are running into trouble after upgrading your Linux kernel (and you almost had a heart attack when you think your data may be lost), you have two choices:

  1. Simply boot to the previous working kernel if you need to get your data back in quick. However, keep in mind that this will create two problems:
    • Since you already update the system with the new kernel and the new packages, your new packages probably will not work with the old kernel, and that may give you extra headache.
    • Unless you manually overwrite the kernel boot order (boot loader config), otherwise you may get into the same trouble in the next boot.
  2. If you want a more “permanent” fix, you will need to rebuild the dkms ZFS and SPL modules. See below for the instructions. Keep in mind that you will have the same problem again when the kernel receives a new update.

You’ve tried to load the ZFS and realize that it is no longer available:

#sudo zpool import
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.

#sudo /sbin/modprobe zfs
modprobe: FATAL: Module zfs not found.

You may want to check the dkms status. Write down the version number. In my case, it is 0.6.5.9

#sudo dkms status
spl, 0.6.5.9, 3.10.0-327.28.3.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
spl, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
zfs, 0.6.5.9, 3.10.0-327.28.3.el7.x86_64, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)

Before running the following commands, make sure that you know what you are doing.

#Make sure that you reboot to the kernel you want to fix.
#Find out what is the current kernel
uname -a
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

#In my example, it is:
3.10.0-514.2.2.el7.x86_64

#Now, let's get into the fun part. We will remove them and reinstall them.
#Don't forget to match your version, in my base, my version is: 0.6.5.9
sudo dkms remove zfs/0.6.5.9 --all
sudo dkms remove spl/0.6.5.9 --all
sudo dkms --force install spl/0.6.5.9
sudo dkms --force install zfs/0.6.5.9

#or you can run these commands in one line, so that you don't need to wait:
sudo dkms remove zfs/0.6.5.9 --all; sudo dkms remove spl/0.6.5.9 --all; sudo dkms --force install spl/0.6.5.9; sudo dkms --force install zfs/0.6.5.9;

And we will verify the result.

#sudo dkms status
spl, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed
zfs, 0.6.5.9, 3.10.0-514.2.2.el7.x86_64, x86_64: installed

Finally we can start the ZFS again.

sudo /sbin/modprobe zfs

Your ZFS pool should back. You can verify it by rebooting your machine. Notice that Linux may not automatically mount the ZFS volumes. You may want to mount it manually or via cron job.

Here is how to mount the ZFS volumes manually.

sudo zpool import -a

You may want to remove all of the old kernels too.

sudo package-cleanup --oldkernels --count=1 -y

Our sponsors:

[Linux]/dev/sdb1: more filesystems detected. This should not happen,

I had a hard drive sitting around, and I decided to format it such that I could use it in my Linux CentOS box. When I decided to mount it, I got the following error message:

mount: /dev/sdb1: more filesystems detected. This should not happen,
       use -t  to explicitly specify the filesystem type or
       use wipefs(8) to clean up the device.

This message simply tells you that there are two or more file systems sitting in the hard drive partitions, and the system does not know which one to use to mount. We can take a closer look to see what’s going on:

sudo wipefs /dev/sdb1


offset               type
----------------------------------------------------------------
0x2d1b0fa8923        zfs_member   [raid]
                     LABEL: storage
                     UUID:  12661834248699203227

0x951                xfs   [filesystem]
                     UUID:  90295123-2395-7456-8521-9A1EE963ac53

As you can see, we have two file systems here. The easiest way is to wipe out the first few sectors of your disk, i.e.,

sudo dd if=/dev/zero of=/dev/sdb bs=1M count=10

And we will re-do everything again, i.e.,

sudo parted /dev/sdb1
...
...
sudo mkfs.xfs /dev/sdb1
sudo mount /dev/sdb1 /mnt/

That’s it!

Our sponsors:

modprobe: ERROR: could not insert ‘zfs’: Required key not available

Today I was trying to install ZFS on a CentOS 7 box. Typically rebooting the computer, the ZFS mododule will be turned on. However, it didn’t turn on in my case.

Failed to load ZFS module stack.
Load the module manually by running 'insmod /zfs.ko' as root.

So I tried to turn on the module:

#sudo modprobe zfs
modprobe: ERROR: could not insert 'zfs': Required key not available.

Turn out this is a newer machine with UEFI available. It has something to do with the secure boot. After I reboot the machine and log in to the BIOS menu, turn on the secure boot feature, and everything is working again.

Have fun with ZFS.

Our sponsors:

Dropbox on FreeBSD

I put my personal websites on a FreeBSD server. One of my websites is a photo album, which I want to read the content from a Dropbox. That Dropbox primarily runs on Mac, iPhone and iPad. I was trying to explore the possibilities to set up a Dropbox on FreeBSD. Since Dropbox doesn’t support FreeBSD officially, I need to use 3rd party tools, most of them are basically based on the Dropbox developer API.

So I have tried several 3rd party tools, as you expect, none of them works. The primary problem is the synchronization, i.e., if my wife adds or deletes a photo on the Dropbox, I expect that the Dropbox folder on FreeBSD will get updated as well. Another problem is the speed. Looks like the Dropbox API is not as fast comparing to its own native application. On the same network, it took few hours to download the content (around 1GB of jpeg files) from Dropbox on FreeBSD, versus 10 minutes on a Mac/Windows/Linux machine using the native application.

So I came up few alternative solutions:

  1. Hosting my website on CentOS Linux. Since Dropbox supports Linux, I can easily read the Dropbox without any problem.
  2. Push the Dropbox content from Mac/Linux to FreeBSD using Rsync periodically (e.g., every 5 mins, hourly etc). That way FreeBSD will have access the Dropbox files.
  3. Set up a NFS service on a Linux box with access to Dropbox, and let the FreeBSD to mount the corresponding NFS share. This solution is okay if both machines are on the same network. It may raise some security concerns if both machines are connected via the public.

Another solution I think it may work is to install the Dropbox native application on FreeBSD. FreeBSD supports running Linux application via Linux emulation. Back in the old days (FreeBSD 8), it was pretty easy to include the Linux support on FreeBSD (one click in the sysinstall). Since the recent releases, they’ve made it harder because not many people wants to run Linux binary on FreeBSD. Based on my previous experience, I think it should work on the latest FreeBSD, but it may require some works.

Another crazy idea will be running Dropbox with Wine on FreeBSD. But this goes way too far from my original purpose, and I am not a big fan of Wine because it adds too many libraries to the system.

Our sponsors:

Running ZFS on Linux: Things you should know and be aware of

ZFS is the next generation file system. Unfortunately, it won’t be shipped with Linux because of legal/licensing issues. Fortunately, it is possible to install it (ZFS on Linux) in few commands. Since 2013, I have set up a number of Linux (CentOS/RHEL) servers with ZFS for use in a high traffic production environment. They include high-end commercial grade server (Xeon-based + ECC memory), gaming quality desktop (i7-based) and entry-level consumer grade computer (i3). In this article, I will discuss about what I have learned from my experience.

Warning on ZFS on Linux

ZFS on Linux is not a robust solution to implement ZFS on Linux because it has a very important (and impossible) requirement: The system will never get updated and rebooted. If you cannot meet this requirement (obviously), be prepare to spend tons of hours to fix the problem and get your data back. See how I fix the problem created by ZFS on Linux here. If you prefer rock solid and reliable way, you have to go with *BSD or Solaris.

Summary

Life is short. If you don’t want to waste your time to go through the entire article, here is my advice: Use FreeBSD (or *BSD) if possible. Using ZFS on Linux is like putting a giraffe in the wild Alaska. It is not going to work. However, we may want to stick with one operating system for server for various reasons. Therefore, I’ve come up some advice for you if you really want to run ZFS on Linux:

  • Use a commercial grade server when it is possible. A bare-bone entry-level Dell Power Edge T110 II (starting from US$300) is sufficient to run ZFS as a low traffic, light load, nightly backup server. Consumer grade computer is not recommended for use in ZFS/Linux. If you really need one, get a computer with gaming quality grade components and always back up the data on a different server.
  • Linux kernel plays an important role to ZFS. Try to use v.3 (e.g., RHEL 7) when possible. Using ZFS with v. 2.6 (e.g., RHEL 6) may cause some unexpected problem to non-commerical grade hardware. As of October 2019, I cannot make version 4 (e.g., install via kernel-ml or CentOS 8) works with ZFS on RHEL 7:
    Loading new spl-0.6.5.9 DKMS files...
    Building for 4.11.2-1.el7.elrepo.x86_64
    Building initial module for 4.11.2-1.el7.elrepo.x86_64
    configure: error: unknown
    Error! Bad return status for module build on kernel: 4.11.2-1.el7.elrepo.x86_64 (x86_64)
    Consult /var/lib/dkms/spl/0.6.5.9/build/make.log for more information.
    
  • Set up your ZFS with the hard drive identifier (e.g., /dev/disk/by-id/someid), not the generic device id (e.g., /dev/sda).
  • You may lose some storage space (smaller than 1%) comparing to the same setup in FreeBSD. But the amount is trivial.
  • If you already install ZFS on Linux, try to exclude the kernel from system update. The system will not load the ZFS after reboot, and it will take some extra work to get ZFS running again.
  • Some Linux distributions such as CentOS 7 will not load ZFS at the boot time. You can solve this problem by using cron job. If you have other services (e.g., MySQL, NFS, Apache) that depends on the ZFS, you will need to restart them.
  • Bookmark this ZFS emergency recovery guide. Trust me, you never know when your ZFS on Linux decide to stop working.

Do not update the kernel automatically

I’ve wrote an article on how to rescue your ZFS file system after updating the kernel. Please click here for details.

ZFS is not native in Linux. The whole idea of ZFS on Linux is nothing more than a brunch of modules being injected to the kernel, such that the kernel will load the ZFS at boot time. This is a fantastic idea because it will not introduce the performance problem like ZFS/FUSE (running on the user land, i.e., very slow). However, there is a potential problem here. This “injection” only happens when a ZFS module (zfs-kmod) is needed to be installed or updated. During this process, the system will download the latest copy of the zfs-kmod and injecting it to the current running kernel. See the problem here?

That being said, running root (/) on ZFS in Linux is a very very bad idea. You will not be able to access anything when the ZFS is not available at the kernel level.

So we have four different situations here after hitting the update command:

Kernel has new update
Kernel has no update

zfs-kmod has new update
Yes. Your ZFS will be available after the reboot.
Yes. Your ZFS will be available after the reboot.

zfs-kmod has no update
No. Your ZFS will not be available after the reboot.
Yes. Your ZFS will be available after the reboot

In general, if you really need to update the kernel, you will need to update the kernel first, reboot to the new kernel (ZFS will be missing), and re-run the process such that ZFS module will be injected to the new kernel. Some people may recommend to uninstall the zfs-kmod and reinstall it again. Unless you have a very strong reason to use the latest kernel (e.g., you’ve got plenty of spare time), otherwise I won’t recommend doing it because the whole process is a pain.

Another thing you can do is to disable the auto update. Only update the system when there is a new update for both kernel and the zfs-kmod. Then you can update the kernel first, reboot, and then update the zfs-kmod after the reboot. However, keep in mind that you will run into some problem eventually. Many packages depend on the newer kernel, if you try to update the system, it will complaint because you will need to update the kernel first before updating those packages. You can get around by skipping the broken packages (yum update –skip-broekn).

In my settings, I simply exclude the kernel from the update. That way I only need to work with one kernel, and I know that that particular kernel knows how to handle ZFS module.

sudo nano /etc/yum.conf 

exclude=kernel*

In case you are running into trouble, i.e., ZFS is missing in the latest kernel, you can try doing the following:

Before running the following commands, make sure that you know what you are doing.

#Make sure that you reboot to the kernel you want to fix.
#Find out what is the current kernel
uname -a
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

#In my example, it is:
3.10.0-514.2.2.el7.x86_64


#Basically we want to remove the following files:
ls -al /lib/modules/your_new_kernel/extra
-rw-r--r-- 1 root root 344K Dec 12 15:58 splat.ko
-rw-r--r-- 1 root root 167K Dec 12 15:58 spl.ko
-rw-r--r-- 1 root root  14K Dec 12 16:02 zavl.ko
-rw-r--r-- 1 root root  75K Dec 12 16:02 zcommon.ko
-rw-r--r-- 1 root root 2.2M Dec 12 16:02 zfs.ko
-rw-r--r-- 1 root root 130K Dec 12 16:02 znvpair.ko
-rw-r--r-- 1 root root  34K Dec 12 16:02 zpios.ko
-rw-r--r-- 1 root root 324K Dec 12 16:02 zunicode.ko

#If you have no extra modules installed other than ZFS and SPL, you can run the following:
sudo rm -Rf /lib/modules/*/extra/* 

#Otherwise just remove the files one by one.


#And we want to do the same thing to the weak-updates.
ls -al /lib/modules/your_new_kernel/weak-updates

drwxr-xr-x. 2 root root 4.0K Sep 16 10:58 .
drwxr-xr-x. 7 root root 4.0K Sep 16 10:58 ..
lrwxrwxrwx  1 root root   54 Sep 16 10:56 splat.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/splat.ko
lrwxrwxrwx  1 root root   52 Sep 16 10:56 spl.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/spl.ko
lrwxrwxrwx  1 root root   53 Feb 22  2016 zavl.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zavl.ko
lrwxrwxrwx  1 root root   56 Feb 22  2016 zcommon.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zcommon.ko
lrwxrwxrwx  1 root root   52 Sep 16 10:58 zfs.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zfs.ko
lrwxrwxrwx  1 root root   56 Sep 16 10:58 znvpair.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/znvpair.ko
lrwxrwxrwx  1 root root   54 Feb 22  2016 zpios.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zpios.ko
lrwxrwxrwx  1 root root   57 Feb 22  2016 zunicode.ko -> /lib/modules/2.6.32-573.18.1.el6.x86_64/extra/zunicode.ko



#If you have no extra modules installed other than ZFS and SPL, you can run the following:
sudo rm -Rf /lib/modules/*/weak-updates/*


#Otherwise just remove the files one by one.


#Now, let's get into the fun part. We will remove them and reinstall them.
#Don't forget to match your version.
sudo dkms remove zfs/0.6.5.8 --all
sudo dkms remove spl/0.6.5.8 --all
sudo dkms --force install spl/0.6.5.8
sudo dkms --force install zfs/0.6.5.8

And we will verify the result.

#sudo dkms status
spl, 0.6.5.8, 3.10.0-514.2.2.el7.x86_64, x86_64: installed
zfs, 0.6.5.8, 3.10.0-514.2.2.el7.x86_64, x86_64: installed
zfs, 0.6.5.8, 3.10.0-327.28.3.el7.x86_64, x86_64: installed-weak from 3.10.0-514.2.2.el7.x86_64

The Kernel Version Matters

The kernel version does matter, and I will avoid using version 2.6 or below if you don’t have a professional grade hardware, such as Xeon CPU. Here is my comment:

Hardware
Linux Kernel (v.2.6)
Linux Kernel (v.3)
FreeBSD
9 & 10

Dell Power Edge T100 II
(Intel Xeon E3-1240 V2, 8GB memory, US$250)
Stable
Stable
Stable

Dell Power Edge T320
(Intel Xeon E5-2430, 64GB memory, US$2,000)
Stable
Stable
Stable

Gaming Quality Desktop
(Intel i7-4770, 32GB memory, US$900)
Unstable
Stable
Stable

Consumer Grade Desktop
(Intel i3-540, 8GB memory, US$500)
Unstable
Stable
Stable

However, it doesn’t mean that you should always use the latest kernel. Remember one thing, always keep a copy of the previous kernel before switching to the latest one. You never know whether ZFS will work with the latest one or not. For example, I have a big trouble to get ZFS working with 2.6.32-573.7.1.el6.x86_64, which is the latest kernel available on CentOS 6.7 (as of Oct 26, 2015). I ended up switching the system to 2.6.32-573.3.1.el6.x86_64 (-1 kernel). So always test the system before making the switch.

The Hard Drive Identifier

Set up your ZFS with the unique, non-changeable hard drive identifier (e.g., /dev/disk/by-id/wwn-0x1234c567890d0aaa). Do not use the generic device id (e.g., /dev/sda). When you reboot the system, the generic device id (/dev/sda) may get changed. This will be a problem to the ZFS.

For example, when RHEL 7 names the hard drive, it will name the hard drives that are attached directly to the motherboard first, these includes USB flash drives, SD card etc. After that, it will name the hard drives that are attached to the PCIe raid card. When you boot the computer with a USB flash drive attached, and if the USB flash drive was not available at the time you set up the ZFS, this small change is good enough to mess up your ZFS.

Here is an example:

History for 'storage':
zpool create -f storage raidz /dev/disk/by-id/wwn-0x5000c500206e46d4 \
                              /dev/disk/by-id/wwn-0x5000c500205eba0d \
                              /dev/disk/by-id/wwn-0x50014ee25a9074e2 \
                              /dev/disk/by-id/wwn-0x50024e9001c19fb2

So far I only noticed this problem with low-end / consumer grade motherboard. However, this is not a problem with FreeBSD because it is smart enough to re-map the old values.

The Stability

For some odd reasons, the ZFS will be unstable or even unavailable when the I/O is heavy:

  pool: storage
 state: DEGRADED
  scan: none requested
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
config:

        NAME                        STATE     READ WRITE CKSUM
        storage                     DEGRADED     0     0     0
          raidz1-0                  DEGRADED     0     0     0
            wwn-0x5000c500206e46d4  ONLINE       0     0     0
            wwn-0x5000c500205eba0d  ONLINE       0     0     0
            wwn-0x50014ee25a9074e2  ONLINE       0     0     0
            wwn-0x50024e9001c19fb2  UNAVAIL      0     0     0

This kind of problem happens mainly with low-end consumer grade computer with older kernel. Once I upgraded the kernel to a newer version, the problem is gone. No hardware change is needed. Again, I’ve never experienced this kind of problem since FreeBSD 9. The only explanation I can think of is the older Linux Kernel does not support ZFS and low-end computer very well.

Load ZFS at Boot

Some Linux variants such as CentOS 7 will not load ZFS at boot (in my case, my kernel is 3.10.0-327.28.3.el7.x86_64). I choose to run the ZFS via cron job. What if the ZFS contains the files that are required by some service, e.g, your database or web server files are on ZFS? You will need to restart those services after loading the ZFS. Here is an example:

sudo nano /etc/crontab

#Example 1: Load all available ZFS pools
@reboot         root    sleep 20; zpool import -a;

#Example 2: Load all ZFS pools first, then restart the Apache, MySQL and NFS services
@reboot         root    sleep 20; zpool import -a; sleep 15; systemctl restart httpd.service && systemctl restart mariadb.service && systemctl restart nfs-server;

Good luck!

–Derrick

Our sponsors:

CentOS 6: No Networking Connections After Upgrade

Today I reboot my CentOS 6 server, and I realized that the network connection was lost after the upgrade. To be exact, it seems that the problem was caused by the new kernel: 2.6.32-573.1.1.el6.x86_64. It modified the network settings of the server with manual settings (server with DHCP is not affected). Here is how I fix the problem (You will need physical access to the server):

I have noticed that the adapter profile has been modified to something that doesn’t make scenes. If you compare the network settings, you will notice the following differences:

#Before the upgrade
#cat /etc/sysconfig/network-scripts/ifcfg-em1  
PREFIX=24
#After the upgrade
#cat /etc/sysconfig/network-scripts/ifcfg-em1  
PREFIX=32

So I simply make the modifications to the adapter settings and restart the network service, i.e.,

sudo service network restart

And the network connection is back!

That’s it! Hope this tutorial saves you from heart attack.

–Derrick

Our sponsors: