OpenZFS is Awesome!
I’ve been a long time OpenZFS user, and for the past couple of years it’s been pretty much the only file system I’ve used on all of my systems. Packages are available for Debian, and it’s pretty easy to install Debian on an OpenZFS root file system.
Prior to OpenZFS, I had used btrfs for quite a while. While it works (and indeed, once caught an instance of bit rot!), over time I found it annoying to have to manage a four-layer stack of technology for my storage: RAID+cryptsetup+LVM+btrfs. OpenZFS combines those all into one solution, which makes it more complicated to understand but also more resilient since there’s no opaque layers underlying the file system. Plus, the tools for OpenZFS just seem to be more refined and usable.
OpenZFS’ big selling point is safeguarding your data – detecting and mitigating errors so you don’t loose any data. While that’s a great thing in theory, it’s another to experience it in real life. And that finally happened to me a couple of weeks ago. During the weekly scrub of one of my storage pools, I received an email notification that a drive had faulted. Running zpool status
showed the following:
gibmat@olorin:~$ zpool status
pool: olorin-storage-zfs
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub in progress since Mon Feb 20 00:23:43 2023
5.82T scanned at 621M/s, 5.76T issued at 616M/s, 6.50T total
400K repaired, 88.62% done, 00:20:59 to go
config:
NAME STATE READ WRITE CKSUM
olorin-storage-zfs DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-MB2000EAZNL_WCAY00128972 ONLINE 0 0 0
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01421610 ONLINE 0 0 0
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01399588 FAULTED 16 0 0 too many errors
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01440119 ONLINE 0 0 0
ata-WDC_WD3000FYYZ-01UL1B2_WD-WCC131089985 ONLINE 0 0 0
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01442756 ONLINE 0 0 0
errors: No known data errors
Yikes! One of my drives was experiencing read errors but OpenZFS was able to transparently correct the affected 400KB of data! To be fair, this particular drive was almost nine and a half years old, and had experienced a couple transitory read/write errors, so a more severe failure like this wasn’t totally unexpected.
After the scrub completed I removed the drive from the pool, shutdown the system and physically replaced the bad drive with a spare that I had available. Upon rebooting, I initiated a resilver of the pool with the new drive:
gibmat@olorin:~$ sudo zpool replace olorin-storage-zfs ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01399588 /dev/disk/by-id/ata-WDC_WD3000FYYZ-01UL1B3_WD-WMC130E9RXL5
gibmat@olorin:~$ zpool status
pool: olorin-storage-zfs
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Feb 20 03:43:47 2023
1.08T scanned at 2.84G/s, 232G issued at 611M/s, 6.50T total
38.5G resilvered, 3.48% done, 02:59:21 to go
config:
NAME STATE READ WRITE CKSUM
olorin-storage-zfs DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-MB2000EAZNL_WCAY00128972 ONLINE 0 0 0
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01421610 ONLINE 0 0 0
replacing-2 DEGRADED 0 0 0
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01399588 OFFLINE 0 0 0
ata-WDC_WD3000FYYZ-01UL1B3_WD-WMC130E9RXL5 ONLINE 0 0 0 (resilvering)
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01440119 ONLINE 0 0 0
ata-WDC_WD3000FYYZ-01UL1B2_WD-WCC131089985 ONLINE 0 0 0
ata-WDC_WD2003FYYS-02W0B1_WD-WCAY01442756 ONLINE 0 0 0
errors: No known data errors
After about three hours, the resilver finished successfully and my pool was once again in a healthy state. Thanks to OpenZFS, my data was protected against a failing drive and I was able to quickly and easily replace it with a good drive.
And remember – RAID isn’t a backup! But OpenZFS snapshots make it trivial to send an atomic backup to another machine: zfs snapshot olorin-storage-zfs/my-dataset@now && zfs send -I olorin-storage-zfs/my-dataset@previous olorin-storage-zfs/my-dataset@now | ssh remote-host zfs recv remote-storage-zfs/my-dataset
. Done!
Plus, if you have an encrypted dataset, you can send encrypted snapshots (-w
) to a remote, untrusted host without exposing the encryption key. This makes it trivial to backup to a remote cloud host and know that your data will remain private!