ZFS: Compression VS Deduplication (Dedup) in Simple English

Last Edited: Jan 17, 2021

Many people are confused between the ZFS compression and ZFS deduplication because they are so similar. Both of them are designed to reduce the size of the data being stored in the storage. Let me explain the difference between them in simple English.

1. This is how your data looks like originally (Assuming only one unique file):

2. This is how your data look like after being stored in a ZFS pool with compression enabled.

3. This is how your data look like after being stored in a ZFS pool with deduplication enabled.

4. Let say we are storing three identical files, i.e.,

5. ZFS: Compression Only

6. ZFS: Deduplication Only

7. ZFS: Compression + Deduplication

The biggest difference between deduplication and compression is the scope. File compression works at the file level. For example, if you have three identical files, ZFS will store the compressed files three times. Deduplication works at the block level. A block is simply the basic unit of the ZFS storage (e.g., 512 bytes, 4k etc). Imagine ZFS needs to store a big file. What it will do is to divide the file into multiple chunks. Each chunk will be stored in a block. What deduplication does is to remember the content of each block (checksum), and avoid storing the same content again. In other words, deduplication works at a narrower level (think of it as a molecule).

One of the reasons why the drug are usually tested on mice because some of mouse genes are 99% identical to human genes. Imagine we need to store the mouse genes into the database. All we need is to store the gene of mouse once. Later on if we need to store the human genes into the database, we can reference the mouse one rather than storing the same copy again.

Of course, enabling both compression and deduplication will save lots of free space. However, it comes with a very high price tag. If you like to enable deduplication, you need to make sure that you have at least 2GB of memory per 1TB of storage. For example, if your ZFS pool is 10TB, you need to have 20GB of memory installed in your system. Otherwise, you will experience a huge performance hit.

Hope this article helps you to understand the difference between compression and deduplication.

–Derrick

Our sponsors: