Recently, I decided to upgrade a database server from RHEL 6 (CentOS 6) to RHEL 7 (CentOS 7), which involves switching from MySQL 5.5 to MariaDB 5.5. Our server hosts about 100 databases, when I was testing them individually, I didn’t see any problem. However, when I ran the back up all databases one by one using mysqldump (i.e., running mysqldump command for each database, one after one, 100 times), something funny happened. Here is the error message:


#The system was running a brunch of mysqldump commands, one by one (not via background)

Got error: 1016: "Can't open file: './db_my_database/tbl_mytable.frm' (errno: 24)" when using LOCK TAB                                                                                                   LES
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces
mysqldump: Error: 'Out of resources when opening file '/var/tmp/#sql_2d6c_2.MAI' (Errcode: 24)' when trying to dump tablespaces

At the mean time, I tried to access the database via MySQL terminal,

MariaDB [(none)]> SHOW DATABASES;
ERROR 1018 (HY000): Can't read dir of '.' (errno: 24)

This error message means the MySQL cannot access the file. If you google the message, you will notice that there are tons of solutions, and almost every of them suggests you to increase the open_files_limit variable in my.cnf.

Therefore, I checked my configurations (/etc/my.cnf), and I noticed that the value was already set to 30000. I also checked the lsof command and I found something very interesting. Notice that I have 100 database, each of them contains about 60 tables. Each table has about 3 files. Depending on the timeout settings, if all database and tables are opened, the total number of opened file will be 100x60x3 = 18,000

sudo lsof -u mysql | wc
1045   25811 239248

This result suggests that at the time of crashing, the mysql user (the system user that run the MariaDB service) was accessing 1045 files at the same time.

So I was scratching my head. Why I already set the open_files_limit value to 30000 already, and the system crashed at 1045th files? I also verified the memory (command: free) and current process (command: top), and I didn’t find anything unusual. One last thing, I checked the open_files_limit value using MySQL terminal, and this is what I found:

MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| open_files_limit | 1024  |
+------------------+-------+

It seems that MariaDB didn’t honor the open_files_limit I set in config file, instead it uses the default one, which isn’t right. So after some investigations, I’ve noticed that RHEL 7 set up some security stuffs, such that you will need to set the open_file_limit variable at the system level rather than the application level. In the other words, whatever you put in the /etc/my.cnf, it won’t go through the security check at RHEL.

Here is how to set the equivalent open_files_limit at the system level:

sudo mkdir -p /etc/systemd/system/mariadb.service.d/
sudo nano /etc/systemd/system/mariadb.service.d/limits.conf
#Add the following, for me, I like to set the open_files_limit to 30000:
[Service]
LimitNOFILE=30000
sudo systemctl daemon-reload
sudo systemctl restart mariadb

I tried to rerun the command again and that’s what I got:

MariaDB [(none)]> SHOW VARIABLES LIKE 'open_files_limit';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| open_files_limit | 30000 |
+------------------+-------+
1 row in set (0.00 sec)

That’s it! Did I save you from heart attack?

One of the biggest selling points of RHEL is the stability. When we upgraded from RHEL 6 to RHEL 7 (clean install), we expected that everything should work fine without too much modifications. Unfortunately, what I saw is a broken system. I really don’t expect that this happens in an enterprise class product.

Our sponsors: