10 Mar 2023

OpenZFS is Awesome!

I’ve been a long time OpenZFS user, and for the past couple of years it’s been pretty much the only file system I’ve used on all of my systems. Packages are available for Debian, and it’s pretty easy to install Debian on an OpenZFS root file system.

Prior to OpenZFS, I had used btrfs for quite a while. While it works (and indeed, once caught an instance of bit rot!), over time I found it annoying to have to manage a four-layer stack of technology for my storage: RAID+cryptsetup+LVM+btrfs. OpenZFS combines those all into one solution, which makes it more complicated to understand but also more resilient since there’s no opaque layers underlying the file system. Plus, the tools for OpenZFS just seem to be more refined and usable.

OpenZFS’ big selling point is safeguarding your data – detecting and mitigating errors so you don’t loose any data. While that’s a great thing in theory, it’s another to experience it in real life. And that finally happened to me a couple of weeks ago. During the weekly scrub of one of my storage pools, I received an email notification that a drive had faulted. Running zpool status showed the following:

gibmat@olorin:~$ zpool status
  pool: olorin-storage-zfs
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
	repaired.
  scan: scrub in progress since Mon Feb 20 00:23:43 2023
	5.82T scanned at 621M/s, 5.76T issued at 616M/s, 6.50T total
	400K repaired, 88.62% done, 00:20:59 to go
config:

	NAME                                            STATE     READ WRITE CKSUM
	olorin-storage-zfs                              DEGRADED     0     0     0
	  raidz1-0                                      DEGRADED     0     0     0
	    ata-MB2000EAZNL_WCAY00128972                ONLINE       0     0     0
	    ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01421610   ONLINE       0     0     0
	    ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01399588   FAULTED     16     0     0  too many errors
	    ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01440119   ONLINE       0     0     0
	    ata-WDC_WD3000FYYZ-01UL1B2_WD-WCC131089985  ONLINE       0     0     0
	    ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01442756   ONLINE       0     0     0

errors: No known data errors

Yikes! One of my drives was experiencing read errors but OpenZFS was able to transparently correct the affected 400KB of data! To be fair, this particular drive was almost nine and a half years old, and had experienced a couple transitory read/write errors, so a more severe failure like this wasn’t totally unexpected.

After the scrub completed I removed the drive from the pool, shutdown the system and physically replaced the bad drive with a spare that I had available. Upon rebooting, I initiated a resilver of the pool with the new drive:

gibmat@olorin:~$ sudo zpool replace olorin-storage-zfs ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01399588 /dev/disk/by-id/ata-WDC_WD3000FYYZ-01UL1B3_WD-WMC130E9RXL5
gibmat@olorin:~$ zpool status
  pool: olorin-storage-zfs
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Feb 20 03:43:47 2023
	1.08T scanned at 2.84G/s, 232G issued at 611M/s, 6.50T total
	38.5G resilvered, 3.48% done, 02:59:21 to go
config:

	NAME                                              STATE     READ WRITE CKSUM
	olorin-storage-zfs                                DEGRADED     0     0     0
	  raidz1-0                                        DEGRADED     0     0     0
	    ata-MB2000EAZNL_WCAY00128972                  ONLINE       0     0     0
	    ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01421610     ONLINE       0     0     0
	    replacing-2                                   DEGRADED     0     0     0
	      ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01399588   OFFLINE      0     0     0
	      ata-WDC_WD3000FYYZ-01UL1B3_WD-WMC130E9RXL5  ONLINE       0     0     0  (resilvering)
	    ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01440119     ONLINE       0     0     0
	    ata-WDC_WD3000FYYZ-01UL1B2_WD-WCC131089985    ONLINE       0     0     0
	    ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01442756     ONLINE       0     0     0

errors: No known data errors

After about three hours, the resilver finished successfully and my pool was once again in a healthy state. Thanks to OpenZFS, my data was protected against a failing drive and I was able to quickly and easily replace it with a good drive.

And remember – RAID isn’t a backup! But OpenZFS snapshots make it trivial to send an atomic backup to another machine: zfs snapshot olorin-storage-zfs/my-dataset@now && zfs send -I olorin-storage-zfs/my-dataset@previous olorin-storage-zfs/my-dataset@now | ssh remote-host zfs recv remote-storage-zfs/my-dataset. Done!

Plus, if you have an encrypted dataset, you can send encrypted snapshots (-w) to a remote, untrusted host without exposing the encryption key. This makes it trivial to backup to a remote cloud host and know that your data will remain private!