ZFS compared to EXT4 filesystem

  • 4 May 2021
  • 1 reply
  • 448 views

Userlevel 1

Hi Team, do we have some pointers about the advantages of ZFS compared to EXT4 filesystem and why customers should use it to store Repo data ?


1 reply

Badge +1

Hi Jerome,

In contrast to traditional filesystems such as ext3, ext4, ZFS provides a number of advantages. The following is a list of the most notable ones:

Guaranteed Data Integrity

By enabling volume management at the filesystem level, ZFS protects storage data. CoW (Copy on Write) technology is supported by this functionality. When a block of data is changed, its current position on the disk is changed before the new write is completed. The data will be lost or corrupted if your machine crashed or lost power during the operation. ZFS does not move data until the write is complete and checked, ensuring that the data is secure in the event of a device crash. ZFS uses checksums to ensure that data stays original from write to write, which ensures data integrity. This ensures that each write is checked, which prevents bit rot.

Highly Scalable

The storage capacity of ZFS is years ahead in comparison to regular filesystems. The possible maximum of a ZFS storage pool is 6 EiB = 16 * 2^60 Byte, which is as much as 3,000,000 6TB HDDs. A configured ZFS pool can easily be changed in its size to accommodate a growing need for more storage.

Integrated Software RAID

ZFS has an integrated software RAID implementation called RAID-Z. RAIDZ is a type of RAID 5 that provides striping with distributed parity bits. ZFS also implements its own flavour of RAID 6, called RAID-Z2. RAID 6 is similar to RAID 5, but has two parity schemes and is capable of losing two devices without destroying the array. The advantage that RAID-Z has over other hardware or software implementations is that its integration with ZFS mitigates the so-called write hole problem. The write hole is the case where an interruption causes inconsistencies between the data in the RAID array and its parity bits because two devices cannot be updated atomically.

Compression

ZFS uses variable block sizes when compression is enabled so if a block of data is compressible, it will be compressed. Consider that in some situations, you may have significantly faster CPU and Memory than you have IO throughput, in which case it may in fact be faster to read and write compressed data because you are reducing the quality of IO through the channel. So compression isn’t just about saving disk space. 

Why store Repo data?

In my personal experience, I have seen the data size of logs in the ZFS filesystem is reduced up to 40% in comparison with ext4 filesystem. This reduces disk size utilization significantly. Along with that, the use of software raid ensures HA of logs in the repo. So it's wise to use ZFS over ext4 for the repo data. 

Reply