How to Expand ZFS

Today I am going to share my story with you on how to expand my ZFS storage. I have a giant ZFS pool, which consists of 8x2TB (RAIDZ2) and 4×1.5TB (RAIDZ1), with a total of 15TB usable spaces. Since I am running out of space, I decide to upgrade the ZFS by replacing the disks one at a time. So here is the ZFS structure:

#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0


#df
storage/data     14T     13T    1.3T    91%    /storage/

What I decide to do is to replace the 1.5TB disks by 3TB disks (i.e., da[0-3]) one at a time. Basically, here are the steps you typically found on the web:

  1. Power down the server
  2. Replace the 1.5TB disk by 3TB disk one at a time
  3. Power on the server
  4. Replace the disk
  5. Resilver the entire pool.
  6. Hope the resilver process does not return any error.
  7. Repeat the above steps until all disks are replaced.
  8. Turn on the auto expand option and enjoy the extra space

Here are the corresponding commands. After replacing the first disk, you should see the following:

#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     DEGRADED     0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  DEGRADED     0     0     0
            da0     UNAVAI       0     0     0 cannot open
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0

However, the pool will continue to work because it is a RAIDZ pool. So I decide to swap the 1.5TB with the 3TB disk:

#sudo zpool replace mypool da0
#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     DEGRADED     0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  DEGRADED     0     0     0
      replacing-2   DEGRADED     0     0     0
        da0/old     FAULTED      0     0     0  corrupted data
        da0         ONLINE       0     0     0  (resilvering)
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0

Depending on how much data you have on the old disk and the hardware (such as motherboard, SATA configurations etc), resilvering a 1.5TB drive (with 1.3TB of data) took me about 15 hours. Personally, I recommend to start this process in the morning, then you can check the progress and start the next one in the evening. That way you can speed up the work.

So after the resilvering process is done. Make sure that you check the error status. If some files were missing, you need to delete those files first and restore them from your backup. Here is an example of the error:

#sudo zpool status -v

errors: Permanent errors have been detected in the following files: 

/storage/data/aaa
/storage/data/bbb
/storage/data/ccc

Remember, you need to delete the file first, then put the file back from your backup. If you simply replace/overwrite the file, it will not clear the error. Try to check the status again. If the error is not gone yet, you may want to scrub the zpool to trigger the resilver process.

sudo zpool scrub mypool

Or you can try to clear the error message. It will trigger the resilver process automatically.

sudo zpool clear -f mypool

I know what you are trying to say now. How come the ZFS will lose the data even I have RAIDZ and checksum enabled? I have no idea. That’s why we need back up on a different machine. Anyway, the resilver process will take another 15 hours.

After the error is cleared, repeat the steps to replace the disks one by one. Make sure that the error is cleared after every replacement. For me, replacing four hard drives took me exact 5 days, or 120 hours in total. Yes, it is not a fun job.

So after everything is completed, no error or anything bad. You try to check the pool status and you expect a magic will happen. Unfortunately, you will see the same amount of space available. Here are what you will need to do:

#I still have 1.3TB space left.
storage/data     14T     13T    1.3T    91%    /storage/
sudo zpool set autoexpand=on mypool 

Locate one of the disks you have replaced, in my case, they are da0, da1, da2 and da3

#zpool status
        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            da0     ONLINE       0     0     0 --This
            da1     ONLINE       0     0     0 --This
            da2     ONLINE       0     0     0 --This
            da3     ONLINE       0     0     0 --And this
sudo zpool online -e mypool da0

And check the space again…

#Now I have 5.3TB space available.
storage     18T     13T    5.3T    72%    /storage

FYI, here is the math behind the free space calculations. I had 4 x 1.5TB on a RAIDZ1 setup. After the upgrade, I have 4 x 3TB on a RAIDZ1 set up. The increase space will be (3TB – 1.5TB) x (4 – 1) * 0.9 = 4TB

Enjoy the new space!

If you have spent too many hours on the resilvering process, consider the old school way. My old school method is nothing new, but it is rock solid, reliable, takes shorter time and most importantly, no data will be lost. Yes, you are right, I back up the data to another server first, and then I rebuild the ZFS pool on my main server, and copy the data back. Typically, copying the 10TB of data via rsync daemon over a gigabit network will take about 3 days. So it isn’t too bad. The only down side of this solution is the down time, which is about 3 days in my case. If you decide to go with the ZFS replacement, the downtime will be minimized, because the ZFS pool will continue to work during the resilver process.

Further read: How to Improve Rsync Performance

Hope my solutions help!

–Derrick

Our sponsors:

[FreeBSD]Shared object “libaprutil-1.so.5” not found, required by “httpd”

I upgraded the Apache to 2.2.27 on my FreeBSD box via portmaster. The upgrade went very smooth. After the upgrade, I decide to test the Apache by restarting the server. Oh well, I got the following error message:

[FreeBSD]Shared object "libaprutil-1.so.5" not found, required by "httpd"

Oh well, looks like libaprutil-1.so.5 is missing. How about I create this file by soft-linking from the newer version, such as libaprutil-1.so.6, libaprutil-1.so.7 etc. However, when I check the lib directory, I don’t see anything like that.

#/usr/local/lib
-rw-r--r--  1 root  wheel   251k Jun 11 10:58 libaprutil-1.a
-rwxr-xr-x  1 root  wheel   961B Jun 11 10:58 libaprutil-1.la
lrwxr-xr-x  1 root  wheel    21B Jun 11 10:58 libaprutil-1.so -> libaprutil-1.so.0.5.3
lrwxr-xr-x  1 root  wheel    21B Jun 11 10:58 libaprutil-1.so.0 -> libaprutil-1.so.0.5.3
-rwxr-xr-x  1 root  wheel   148k Jun 11 10:58 libaprutil-1.so.0.5.3

Looks like it is not that simple to solve the problem. After couple trials and errors, I came up a solution. First, I need to reinstall the Apache, and then I need to reinstall Apr1.

cd /usr/local/www/apache22
make

Notice that I compile the file without installing it. That’s because I want to check the library dependence.

cd /usr/ports/www/apache22/work/httpd-2.2.27
ldd ./httpd
./httpd:
        libm.so.5 => /lib/libm.so.5 (0x80087d000)
        libpcre.so.3 => /usr/local/lib/libpcre.so.3 (0x800a9e000)
        libaprutil-1.so.0 => /usr/local/lib/libaprutil-1.so.0 (0x800d02000)
        libdb-4.8.so.0 => /usr/local/lib/libdb-4.8.so.0 (0x800f27000)
        libgdbm.so.4 => /usr/local/lib/libgdbm.so.4 (0x801299000)
        libexpat.so.6 => /usr/local/lib/libexpat.so.6 (0x8014a4000)
        libiconv.so.3 => /usr/local/lib/libiconv.so.3 (0x8016c8000)
        libapr-1.so.0 => /usr/local/lib/libapr-1.so.0 (0x8019c4000)
        libcrypt.so.5 => /lib/libcrypt.so.5 (0x801bf5000)
        libthr.so.3 => /lib/libthr.so.3 (0x801e14000)
        libc.so.7 => /lib/libc.so.7 (0x802037000)
        libapr-1.so.5 => not found (0)
        libintl.so.9 => /usr/local/lib/libintl.so.9 (0x802392000)

So it is libaprutil-1.so.0, which is what we have in /usr/local/lib/. That’s good. Now we can hit the install button.

cd /usr/ports/www/apache22/
make reinstall

So let’s try to run Apache again. Looks like we got a different error message:

/usr/local/etc/rc.d/apache22 restart
Performing sanity check on apache22 configuration:
Shared object "libapr-1.so.5" not found, required by "libaprutil-1.so.0"

It’s okay. Let’s do something similar to the apr1:

cd /usr/ports/devel/apr1/
make reinstall

Try to restart Apache again. The problem should be gone:

/usr/local/etc/rc.d/apache22 restart
Performing sanity check on apache22 configuration:
Syntax OK
Stopping apache22.
Waiting for PIDS: 1022.
Performing sanity check on apache22 configuration:
Syntax OK
Starting apache22.

Hope my solutions help!

–Derrick

Our sponsors:

[FreeBSD]PHP does not work after upgrading Apache to 2.2.27

I upgraded the Apache to 2.2.27 on my FreeBSD box via portmaster. The upgrade went very smooth. After the upgrade, I found that Apache no longer rendered the PHP page correctly. In the other words, it displayed the source code of the PHP files instead of executing the code.

Before you start doing anything, please make sure that your website is not accessible from public. For example, most web applications like to include the password information in the source code, which is accessible by public if the PHP engine is failed. You may want to restrict the public access during the fix. The easiest way is to set up a .htaccess file and restrict the access by IP address.

Let’s come back to the fix. For some reasons, Apache/PHP team decide to make something fun to the FreeBSD sysadmin, because they think FreeBSD sysadmin have plenty of spare time. Here is what they decide to do:

Originally, we install Apache first, then we install PHP. When we install the PHP, we pick an option such that the PHP will install a PHP engine used by Apache. In the recent release, they decide to remove the engine from the standard PHP package. For example, if you simply use the portmaster to upgrade your PHP (In my case, PHP 5.4 and 5.5), the engine will be missing. That’s why Apache fails to render the PHP files.

Here is what you will need to do. I suggest you to finish reading the following before doing the fix. Trust me. It may save you couple hours.

First, make sure that Apache is working fine.

If you are using PHP 5.4, make sure that you install this port: /usr/ports/www/mod_php5. If you are using PHP 5.5, install this one instead: /usr/ports/www/mod_php55. Make sure that you select the ZTS option. You can do it by running:

sudo make config

Notice that installing this port will reinstall the PHP package (/usr/ports/lang/php5 or /usr/ports/lang/php55). Make sure that the ZTS option is selected in PHP package as well.

Now try to restart the Apache server. That should make the Apache to render the PHP files instead of dumping it.

#sudo /usr/local/etc/rc.d/apache stop
#sudo /usr/local/etc/rc.d/apache start

Test the PHP files again. If you see the PHP result instead of PHP source code, congratulations! You are 20% done.

Now go back to the command line and run the following:

#List the PHP extensions installed by PHP 
php -m

#Check whether the PHP extensions are loaded properly by PHP
php -v

For some reasons, I noticed that many extensions were missing during the reinstallation, such as sessions, json etc. Let’s install them back:

#For PHP 5.4
cd /usr/ports/lang/php5-extensions

#For PHP 5.5
cd /usr/ports/lang/php55-extensions


sudo make reinstall clean

Don’t forget the pecl and related packages as well. After that, try to restart your Apache server and clean up the php extension configuration:

nano /usr/local/etc/php/extensions.ini

You will see a mess. Try to clean up the duplicated extensions and run the php -m and php -v again.

During the fix, I notice that everything works fine except for MySQL package. Interestingly, when I ran php -m, the mysql was available. However when I ran phpinfo(), it was missing. I found that this problem only happens with PHP 5.4, but not PHP 5.5. Therefore, I decided to remove the PHP 5.4 and loaded PHP 5.5 instead.

Simply repeat the installation and restart the server. If possible, try to reboot the machine.

So here is the summary:

  1. Remove PHP, Extensions and PECL packages (sudo make deinstall).
  2. Restart Apache (sudo /usr/local/etc/rc.d/apache stop; sudo /usr/local/etc/rc.d/apache start).
  3. Verify installed extensions with php -m; php -v; and phpinfo().
  4. Verify the result by loading the pages on the web.
  5. If your page is failed, try to find out which extension is missing by checking the error log, which is typically available in in /var/log/

Hope my solutions help!

–Derrick

Our sponsors:

XAMPP – PHP Fatal error: Class ‘ZipArchive’ not found

I just finished writing a program written in PHP today. When I moved to the another environment, I found that the script didn’t work. After some investigations, I found that the program stopped running because of the following error:

PHP Fatal error:  Class 'ZipArchive' not found in ...

The error is obvious, some PHP extensions were missing. Therefore, I ran the following to see which modules were being loaded by the php:

#Make sure that the php is living in /opt/lampp/
#which php
/opt/lampp/bin/php

#php -m
[PHP Modules]
...
...
xsl
zlib

Okay, the problem is clear, zip is missing.

For some (stupid) reasons, XAMPP (1.8.1) removes the zip support from the package. I don’t understand why because it was available in the older version (1.7.3). Removing a popular package will create troubles and backward compatibility issues. Anyway, the XAMPP is a non-professional product. It is not designed for production uses.

Before we go further, please make sure that your PHP is 5.4.7. It is because this tutorial is version specific.

#php -v

PHP 5.4.7

Here are the details:

#As always, we want to switch to root user first
sudo su

#Switch to the extension directory
cd /opt/lampp/lib/php/extensions/no-debug-non-zts-20100525

#Download the missing zip.so
#This zip.so was extracted from a 32-bit Linux environment, and was complied using PHP 5.4.7 32-bit i686
#It will not work with a different PHP version.
wget http://icesquare.com/download/zip.so

#Modify the permissions
chmod a+x zip.so
chown root:root zip.so

Now, we will need to modify the php.ini

#nano /opt/lampp/etc/php.ini

#Look for the zip.so and remove the comment

#From
;extension="zip.so"

#To
extension="zip.so"

Next, we will need to verify that the extension is being loaded by php:

#php -m
[PHP Modules]
...
...
xsl
zip < ------------ This one
zlib

If everything looks good, we will need to restart the apache.

#/opt/lampp/lampp reloadapache

That’s it! Now the error is gone. You can also verify it by including phpinfo() in your code. It should display the details about the zip package.

My last word: Stay away from XAMPP when possible. It is using 32-bit stone-age technology, full of security-related bugs (because they release one or two versions ever year), and is packaged in a very non-professional way (such as backward compatibility). Consider to move to native Apache + PHP + MySQL. Believe me, you will see at least 10% of the performance improvement.

–Derrick

Our sponsors:

install: /usr/ports/…/doc/: Inappropriate file type or format

While I tried to upgrade both of my FreeBSD server (9 & 10) today, I got the following error:

install: /usr/ports/…/doc/: Inappropriate file type or format

Actually, here is the complete error message:

===>  Staging for memcached-1.4.17_1
===>   memcached-1.4.17_1 depends on shared library: libevent-2.0.so - found
===>   Generating temporary packing list
/usr/bin/make  install-recursive
Making install in doc
/usr/bin/make  install-am
test -z "/usr/local/man/man1" || /bin/mkdir -p "/usr/ports/databases/memcached/work/stage/usr/local/man/man1"
 install  -o root -g wheel -m 444 memcached.1 '/usr/ports/databases/memcached/work/stage/usr/local/man/man1'
test -z "/usr/local/bin" || /bin/mkdir -p "/usr/ports/databases/memcached/work/stage/usr/local/bin"
  install  -s -o root -g wheel -m 555 memcached '/usr/ports/databases/memcached/work/stage/usr/local/bin'
test -z "/usr/local/include/memcached" || /bin/mkdir -p "/usr/ports/databases/memcached/work/stage/usr/local/include/memcached"
 install  -o root -g wheel -m 444 protocol_binary.h '/usr/ports/databases/memcached/work/stage/usr/local/include/memcached'
install  -o root -g wheel -m 555 /usr/ports/databases/memcached/work/memcached-1.4.17/scripts/memcached-tool /usr/ports/databases/memcached/work/stage/usr/local/bin
install  -o root -g wheel -m 444 /usr/ports/databases/memcached/work/memcached-1.4.17/doc/ /usr/ports/databases/memcached/work/stage/usr/local/man/man1
install: /usr/ports/databases/memcached/work/memcached-1.4.17/doc/: Inappropriate file type or format
*** [post-install] Error code 71

Stop in /usr/ports/databases/memcached.
*** [install] Error code 1

Stop in /usr/ports/databases/memcached.

Initially, I thought the problem was caused by the syntax error of the code, therefore, I tried to do the following:

#Shut down the server
sudo /usr/local/etc/rc.d/memcached stop

#Remove Memcached from the system
sudo make deinstall

#Clean up the package
sudo make clean

#Compile the package
sudo make

#Install the package
sudo make install

Unfortunately, it still gave the same error message. I noticed that the error message has something to do with the documentation, so I decide to remove the documentation option in the installation page, i.e.,

#Enter the package setup
sudo make config

#Remove the "doc" option.

Unfortunately, it still gave the same response. I then tried to investigate the cause of the error. So I tried to copy the error command and re-ran it:

install  -o root -g wheel -m 444 /usr/ports/databases/memcached/work/memcached-1.4.17/doc/ /usr/ports/databases/memcached/work/stage/usr/local/man/man1

Which return the exact the same error message:

install: /usr/ports/databases/memcached/work/memcached-1.4.17/doc/: No such file or directory

So I came up an idea. Since I don’t really need the documentation, why not I disable the document option?

cd /usr/ports/databases/memcached

sudo nano Makefile

Change the following from:

post-install:
        ${INSTALL_SCRIPT} ${WRKSRC}/scripts/memcached-tool ${STAGEDIR}${PREFIX}/bin
        ${INSTALL_MAN} ${WRKSRC}/doc/${MAN1} ${STAGEDIR}${MAN1PREFIX}/man/man1
        @${MKDIR} ${STAGEDIR}${DOCSDIR}

To:

post-install:
        ${INSTALL_SCRIPT} ${WRKSRC}/scripts/memcached-tool ${STAGEDIR}${PREFIX}/bin
#        ${INSTALL_MAN} ${WRKSRC}/doc/${MAN1} ${STAGEDIR}${MAN1PREFIX}/man/man1
        @${MKDIR} ${STAGEDIR}${DOCSDIR}

That’s right, I just want to comment out the installation of the man/documentation part. Now, let’s try to reinstall the package again.

#sudo make install

Guess what, it works! Have fun!

–Derrick

Our sponsors:

The easiest way to improve the performance of MySQL server on FreeBSD

There are many different ways to improve the MySQL performance. In general, it breaks down into two different categories: Server side and client side.

On the server side, we can optimize the database and table structure, such as indexing the columns etc. On the client side, we can optimize the queue to minimize the workload, or we can cache and share the result such that the traffic to the server will be minimized. However, these methods are doable if you have access to the source code, or you understand the logic of the software. If you are a system administrator, you may not want to touch the source code, because you never know what will happen after the modifications. Plus your modification may be overwritten in the next update.

I am going to show you a quick and easy way to solve this problem. First, I am assuming that you build the MySQL from source. In the other words, this article will not work if you install the MySQL through pkg_add, yum, apt-get etc.

My solution is very simple: Compiling the MySQL server with static option enabled

By default, we will compile the software from source, it is not built in static. According to MySQL documentation, building the binary using static will result a 13% improvement comparing to building the binary using dynamic. Here is an example how to build MySQL with static option enabled:

cd /usr/ports/databases/mysql56-server
make BUILD_OPTIMIZED=yes BUILD_STATIC=yes
make install clean

This method will work for the first time. It may be a problem in the long run. For example, I use portsnap to update the port tree, and I use portmaster to upgrade the application. By default, portmaster will use the default options to rebuild the port. In the other words, the MySQL will not be built using static.

To solve this problem, I will need to make the build static option as a default settings. First of all, try to include the following in /etc/make.conf

sudo nano /etc/make.conf

WITH_CHARSET=utf8
WITH_COLLATION=utf8_general_ci
BUILD_OPTIMIZED=yes
BUILD_STATIC=yes

We can update the port tree and ports again. This will make the system to use the new settings.

sudo portsnap fetch update
sudo portmaster -Day
#Don't forget to restart the MySQL server.
sudo /usr/local/etc/rc.d/mysql-server start

Now your MySQL server and other applications are built using static.

Please click here to learn more about building static versus building dynamics.

–Derrick

Our sponsors:

How to Stress Test ZFS System

If you have set up a ZFS system, you may want to stress test the system before putting it in a production environment. There are many different ways to stress test the system. The most common way is to fill the entire pool using dd. However, I think scrubbing the entire pool is the best.

In case you are not familiar with scrubbing, basically it is a ZFS tool to test the data integrity. The system will go through every single file and perform checksum calculation, parity check etc. During scrubbing the entire pool, the system will generate a lot of I/O traffic.

First, please make sure that your ZFS is filled with some data. Then we will scrub the system:

sudo zpool scrub mypool

Afterward, simply run the following command to check the status:

#sudo zpool status -v
  pool: storage
 state: ONLINE
  scan: scrub in progress since Sun Jan 26 19:51:03 2014
        36.6G scanned out of 14.4T at 128M/s, 32h38m to go
        0 repaired, 0.25% done

Depending on the size of your pool, it may take few hours to few days to finish the entire process.

So how does the scrubbing related to the stability? Let’s take a look to the following example. Recently I set up a ZFS system which was based on 6 hard drives. During the initial setup, everything was fine. It gives no error or anything when loading the data. However, after I scrubbed the system, something bad happened. During the process, the system disconnected two hard drives, which made the entire pool unreadable (That’s because RAID1 can afford up to one disk fails). I was feeling so lucky because it didn’t happen in a production environment. Here is the result:

sudo zpool status
  pool: storage
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        storage                   UNAVAIL      0     0     0
          raidz1-0                UNAVAIL      0     0     0
            ada1                  ONLINE       0     0     0
            ada4                  ONLINE       0     0     0
            ada2                  ONLINE       0     0     0
            ada3                  ONLINE       0     0     0
            9977105323546742323   UNAVAIL      0     0     0  was /dev/ada1
            12612291712221009835  UNAVAIL      0     0     0  was /dev/ada0

After some investigations, I found that the error had nothing to do with the hard drives (such as bad sectors, bad cables etc). Turn out that it was related to bad memory. See? You never know what component in your ZFS is bad until you stress test it.

Happy stress-testing your ZFS system.

–Derrick

Our sponsors:

ZFS Compression: lz4 VS lzjb

ZFS offers a new compression method in the latest version: lz4. It claims to be better than lzjb. Since lzjb is pretty good already, I am curious to find out how good will lz4 be comparing to lzjb. According to my tests, lz4 is performing better than lzjb in terms of spacing saving and I/O, but not too much.

lz4 VS lzjb: Space Saving

Long story short, here is what I did. I set up two servers with brand new ZFS settings. One server is set to lzjb, and the other server is set to lz4. Both servers have exact the same hardware, software, operating system etc. Both of them are loaded with the same set of data (10TB) via rsync and rsyncd. These data are real data that I use everyday, which includes office documents (PDF, Word, Excel), photos(Raw and JPEG), music(mp3), video(mpeg), zip, source codes, binary applications, database (MySQL, Redis), webserver files (PHP, HTML, CSS), etc. Notice that some of these files are already compressed (such as jpeg, zip etc), and the file types are not evenly distributed (e.g., 40% of the files are jpeg, 25% are docs, 10% are zip, 5% are something else). I want to make a test in a more realistic scenario.

Also, all ZFS settings are set BEFORE loading the data. This will ensure that all data on the same server share the same compression settings. Below is the summary of the used space. Keep in mind that the space saving test is independent to the hardware, such as CPU type, memory, speed of the disks, etc. Just like running the same command to compress the same set of files in two systems. We expect the result will have the same size. The only difference will be the time spent on compressing the data.

#ZFS with lzjb
df
Filesystem    512-blocks        Used     Avail Capacity  Mounted on
storage/data 23007987368 22996620466  11366902   100%    /storage/data

#Used space: 22996620466 blocks = 10965.6 GB
#ZFS with lz4
df
Filesystem    512-blocks        Used      Avail Capacity  Mounted on
storage/data 31527470077 22942284913 8585185164    73%    /storage/data

#Used space: 22942284913 blocks = 10939 GB

I found that for 10TB of data, the server with lzjb uses 26GB more space than lz4, which translate to 0.23% (26GB out of 10TB). The difference is quite small and not too significant. However, when your ZFS is nearly full or you are too busy to upgrade your system, the extra saving may matter a lot.

lz4 VS lzjb: Time Saving

I am also curious to know how much time I will save by switching from lzjb to lz4. Therefore I did another simple test. First, I generated a 1GB of random data and wrote to the lzjb system. After that, I destroyed the zpool and rebuilt the zpool with lz4. This will eliminate other factors such as hardware, network traffic etc, because both tests were done within the same system.

#ZFS with lzjb
time dd if=/dev/random of=/storage/data/file.out bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 23.725955 secs (44195313 bytes/sec)

real    0m24.144s
user    0m0.024s
sys     0m18.326s
#ZFS with lz4
time dd if=/dev/random of=/storage/data/file.out bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 22.589257 secs (46419234 bytes/sec)

real    0m22.802s
user    0m0.016s
sys     0m18.273s

P.S. The /dev/random in FreeBSD is nothing like the one in Linux. It is very fast.

By just upgrading from lzjb to lz4, the time is reduced from 24.144 to 22.802 seconds, which translate to 5.5% (1.342s out of 24.144s) improvement.

This result also agrees with my overall experience: The improvement is small and not noticeable. In fact, I barely notice any performance improvement after switching from lzjb to lz4, which primarily includes transferring the files between Windows and FreeBSD-based ZFS system via Samba on a gigabit network. The I/O speed are about the same. However, if you are talking about a busy server with lots of traffic, a 5% improvement will be something.

Anyway, this is what I recommend. If lz4 is available in your ZFS version, use it. It can’t be wrong.

sudo zfs set compression=lz4 myzpool

–Derrick

Our sponsors:

ZFS Performance Boost/Improvement: How I push the I/O speed to 126MBps

This article is part of my main ZFS tutorial: How to improve ZFS performance. That article covers everything you need. If you already have the basic knowledge, or you just want to know how I push the I/O speed to 120+ MBps, you can skip that article and read this one.

Long story short, here is the result of my iostat:

sudo zpool iostat -v

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
storage     7.85T  12.1T      1  1.08K  2.60K   126M < ---126 MBps!
  raidz2    5.51T  8.99T      1    753  2.20K  86.9M
    ada0        -      -      0    142    102  14.5M
    ada1        -      -      0    143    102  14.5M
    ada3        -      -      0    146    307  14.5M
    ada4        -      -      0    145    409  14.5M
    ada5        -      -      0    144    204  14.5M
    ada6        -      -      0    143    204  14.5M
    ada7        -      -      0    172    409  14.5M
    ada8        -      -      0    171    511  14.5M
  raidz1    2.34T  3.10T      0    349    409  39.5M
    da0         -      -      0    169    204  13.2M
    da1         -      -      0    176      0  13.2M
    da2         -      -      0    168    102  13.2M
    da3         -      -      0    173    102  13.2M
----------  -----  -----  -----  -----  -----  -----

I measured the speed while I was transferring 10TB of data from one FreeBSD-based ZFS server to another FreeBSD-based ZFS server, over a consumer-level gigabit network. Basically, every components I used are consumer-level. The hard drives I used are standard PATA and SATA hard drives (i.e., not even the light speed SSD hard drive). The data I transferred are real data (rather than all zeros or ones generated through dd).

First of all, here are the settings I used. Both server and client have similar configurations.

sudo zpool history
History for 'storage':
2014-01-22.20:28:59 zpool create -f storage raidz2 /dev/ada0 /dev/ada1 /dev/ada3 /dev/ada4 /dev/ada5 /dev/ada6 /dev/ada7 /dev/ada8 raidz /dev/da0 /dev/da1 /dev/da2 /dev/da3
2014-01-22.20:29:07 zfs create storage/data
2014-01-22.20:29:15 zfs set compression=lz4 storage
2014-01-22.20:29:19 zfs set compression=lz4 storage/data
2014-01-22.20:30:19 zfs set atime=off storage
#where the ad* and da* are nothing more than standard SATA drives:
#ZFS Drives
#8 x 2TB standard SATA hard drives connected to the motherboard
#4 x 1.5TB standard SATA hard drives connected to a PCI-e RAID card

da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes)
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada5: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada6: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada6: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada7: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada7: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada8: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada8: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)


#System Drive
#A PATA 200MB drive from 2005
ada2: maxtor STM3200820A 3.AAE ATA-7 device
ada2: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
ada2: 190782MB (390721968 512 byte sectors: 16H 63S/T 16383C)

Here are the corresponding hardware components:

#CPU
hw.machine: amd64
hw.model: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
hw.ncpu: 8
hw.machine_arch: amd64


#Memory - 3 x 2GB
real memory  = 6442450944 (6144 MB)
avail memory = 6144352256 (5859 MB)

#Network - Gigabit network on the motherboard / PCI-e
#If your gigabit network card is PCI, that will be your bottleneck.
re0: realtek 8168/8111 B/C/CP/D/DP/E/F PCIe Gigabit Ethernet

Other than that, I didn’t really specify any special settings in my OS. Everything else are default:

#uname -a
FreeBSD 9.2-RELEASE-p3 FreeBSD 9.2-RELEASE-p3 #0: Sat Jan 11 03:25:02 UTC 2014     [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64

The way how to transfer 10TB of data is through rsync and rsyncd. First of all, I set up a rsync as daemon on the server. Keep in mind that rsync as a daemon (service) is not the same as rsync. If you want to know how I set up the rsyncd, please visit How to Improve rsync Performance.

After you have set up the rsyncd on the server, I run the following on the client.

rsync -av server_ip::rsync_share_name /my_directory/

#Example
rsync -av 192.168.1.101::storage /mydata/

Notice that I didn’t enable the compression option. That’s because my files are pretty much compressed (jpeg, zip, tar, 7z etc). If your file types are different, you may want to enable the compression, i.e.,

rsync -avz server_name::rsync_share_name /my_directory/

Give it a try and see whether it improves the I/O speed.

Here are few things I have learned to improve the ZFS performance.

Keep the ZFS structure small and simple. For example, keep the number of disks in your vdev small. Previously, I set up a giant big pool with 14 disks RAIDZ in one single vdev. This is a bad idea. There is a maximum number of disks for RAIDZ. That’s why I split my disks into two groups:

#Group 1 - RAIDZ2
#8 x 2TB standard SATA hard drives connected to the motherboard

#Group2 - RAIDZ
#4 x 1.5TB standard SATA hard drives connected to a PCI-e RAID card

zpool create -f storage myzpool group1 raidz group2

Keep the ZFS settings clean. Enable the only thing you need, and disable the junk settings. In my case, I enable the best compression (lz4) and disable the access time:

2014-01-22.20:29:15 zfs set compression=lz4 storage
2014-01-22.20:30:19 zfs set atime=off storage

Keep the ZFS clean. I see the performance gain on a brand new ZFS than a few years old ZFS, even both of them contain the same data. Unfortunately, there is no easy way to clean the ZFS. The only way is to destroy the current ZFS first, create a new one and bring the data back. Typically, it takes about 2 days to transfer 10TB of data over a gigabit LAN. So it is really not too bad to spend about 5 days to rebuild your ZFS.

That’s it. Again, if you want to learn about tricks, please visit How to improve ZFS performance for more information.

–Derrick

Our sponsors:

How to Remote Desktop To Linux From Windows Using XRDP

If you are looking for ways to remote desktop to Linux from Windows, and you are sick of VNC, you are in the right place.

I have been looking for a solution to do something very simple. I want to remotely access the desktop of my Linux servers from Windows machine. Of course, it must be at a usable level.

I’ve tried different applications before, incluing XMing, XManager (X11 forwarding), VNC, 2X, NoMachine, ranging from open-source to commercial. Unfortunately, they offer nothing close to the Windows to Windows Remote Desktop experience.

Why XMing is bad?

XMing allows you to run a remote application (e.g., Firefox) on your desktop. It works great as long as you keep the session available. Once you close your session, there is no way to go back to the previous session. For example, suppose I am browsing a website using the Firefox on the remote server via XMing. If I decide to close the XMing (without closing the browser), there is no way for me to go back to the page I left off later. Of course, in reality, it doesn’t make too much sense to use a Firefox on a remote server. I was referring to some software that takes days to run, such as R, Matlab etc.

By the way, it is an application level, meaning that it does not work with the desktop level.

Why VNC is bad?

VNC is nearly perfect. It offers something similar to Windows Remote Desktop, i.e., I can access to the desktop level. If I work on something, I can get my previous session back even my session is closed. There are only two disadvantages: Slow, and not for multiple users.

VNC uses a stone-age protocol to delivery the remote desktop experience: Screen capture. When there is a change on the remote server, it takes a screen shot and send it to the client. You can imagine how slow it will be over the Internet. Although they released a better algorithm, which works directly at the graphic card level, i.e., only send the modified area to the client instead of everything. However the new protocol is still far from usable. The Microsoft Remote Desktop just blows VNC far far away.

Another thing I don’t like VNC is that it is at the machine/server level, not the user level. VNC uses its own user authorization. When you set up a VNC service, you define your own password. That’s it. In the other words, if I open a desktop on the server as derrick. Everybody will be able to see my desktop content via VNC as long as they enter the correct password. Unless I go to the server physically and switch to a different user, otherwise there is no way to protect the content of my desktop.

The good thing about VNC is that it is cross-platform. It works on OS X, Windows, Linux etc. Wait a minute. Java is cross-platform too. Do you want to use it?

Why XManager is bad?

XManager is better than VNC. It uses some special protocol such that the speed is not an issue. However, it is suffered in the session problem like XMing. If I remote desktop to the server using XManager, and I close my session later, there is no way for me to go back to my previous sessions. In the other words, I will lose all opened applications and contents.

And no, the session is still running silently on the server. They are just orphans. XManager doesn’t allow you to connect to the old session when you try to establish a new connection.

Solution: xrdp

I am so happy that I finally found something useful and usable. That’s xrdp. Here is a step-by-step tutorial on how to install it on Fedora/CentOS Linux. The instruction will be similar to Debian/Ubuntu Linux. The set up time should take no more than 5 minutes.

#Install xrdp
sudo yum install -y xrdp

If your Linux is too old, or your default repository does not have xrdp available, you may want to run the following:

#Don't forget to use the right (i686 or x86_64) architecture.

sudo rpm -Uvh  http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo rpm -Uvh  http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
sudo yum update -y

sudo yum install -y xrdp

Next, we want to start the xrdp service

sudo /etc/rc.d/init.d/xrdp start

You may want to enable this service during the boot up:

sudo chkconfig  xrdp on

Don’t forget to open the port 3389 in your firewall:

sudo nano /etc/sysconfig/iptables

-A INPUT -m state --state NEW -m tcp -p tcp --dport 3389 -j ACCEPT

Don’t forget to restart the firewall:

sudo service iptables stop
sudo service iptables start

Now, run the remote desktop on Windows, and enter the IP address of the server to see what happen.

–Derrick

Our sponsors: