situ: Backing up ZFS file systems

Saturday, 17 January 2009

Backing up ZFS file systems

This is one of the good things ZFS has brought us. Backing up a file system is a ubiquitous problem, even in your home PC, if you're wise and care about your data. As many things in ZFS, due to the telescoping nature of this file system (using words of ZFS' father, Jeff Bonwick), backing up is tightly connected to other ZFS' concepts: in this case, snapshots and clones.

Snapshotting

ZFS lets the administrator perform inexpensive snapshots of a mounted filesystem. Snapshots are just what their name implies: a photo of a ZFS file system in a given point in time. Since that moment, the file system from which the snapshot was generated and the snapshot itself begin to branch and the space required by the snapshot will roughly be the space occupied by the differences between these two entities. If you delete a 1 GB file from a snapshotted filesystem, for example, the space accounted for that file will go in charge of the snapshot which, obviously, must keep track of it because that file existed when the snapshot was created. So far, so good (and easy). Creating snapshot is also incredibly easy: provided that you have a role with the required privileges you just issued the following command:

$ pfexec zfs snapshot zpool-name/filesystem-name@snapshot-name

Now you have a photo of the zpool-name/filesystem-name ZFS file system in a given point in time. You can check about its existence by issuing:

$ zfs list -t snapshot

which in this moment, in my machines, gives me:

$ zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
rpool/export/home/enrico@20081231 71.3M - 14.9G -
[...]

This means that the ZFS file system which hosts my home directory has been snapshotted and there's a snapshot named 20081231.

Cloning

Cloning is pretty much like snapshotting with the difference that the result of the operation is another ZFS file system, obviously mounted in another mount point, which can be used like whichever file system. Like snapshots, the clone and the originating file system will begin to diverge and differences will begin to occupy space in the clone. The official ZFS administration documentation has detailed and complete information about this topic.

Backing up

This isn't really how documentation calls it: they just refer to it with ZFS send and receive operations. As seen, we've got a mean to snapshot a file system: there's no need to unmount a file system or run the risk of getting a set of inconsistent data because a modification occurred during the operation. This alone is worth switching to ZFS, in my opinion. Now there's more: a snapshot can be dumped (serialized) to a file with a simple command:

$ pfexec zfs send zpool-name/filesystem-name@snapshot-name > dump-file-name

This file contains the entire ZFS file system: files and all the rest of metadata. Everything. The good thing is that you can receive a ZFS file system just doing:

$ pfexec zfs receive another-zpool-name/another-filesystem-name <>

This operation creates the another-filesystem-name on pool another-zpool-name (it can even be the same zpool you generated the dump from) and a snapshot called snapshot-name will also be created. In the case of full dumps, the destination file system must not exist and will be created for you. Easy. Full back up with just two lines, a bit of patience and sufficient disk space.

There are the usual variations on the theme. You don't really need store the dump in a file, you could just pipe send into receive and do it in just one line with no need of extra storage for the dump file:

# zfs send zpool-name/filesystem-name@snapshot-name | zfs receive another-zpool-name/another-filesystem-name

And if you want to send it to another machine, no problems at all:

# zfs send zpool-name/filesystem-name@snapshot-name | ssh anothermachine zfs receive another-zpool-name/another-filesystem-name

Incredibly simple. ZFS is really revolutionary.

Incremental backups

ZFS, obviously, lets you do incremental send and receive with the -i option which lets you send the differences between one snapshot and another. These differences will be loaded and applied at the receiver side: in this case, obviously, the source snapshot must already exist. You start with a full send and then you go on with increments. It's the way I'm backing up our machines and it's fast, economic and reliable. A great reason to switch to ZFS, let alone Solaris.

situ