8. June 2023

backups@home

I was asked on Mastodon how my backup at home works. And as it would be hard to get this into one toot and I wanted to write it down anyway I decided to blog about it.

Nevertheless, here is the

tl;dr

I use snapshots against accidental deletes and access to older versions of a file, and (incremental) backups to other computers against broken disks and filesystems. And then there is version control.

Goals

Nobody wants backup. Everybody wants restore.
Michael Nagorsnik (SUN)

Backup is not a goal. It is a necessary means to be able to restore, and that is, what I want.

I don’t want to loose data

This is what people think about first when they talk about backups: these precious pictures from the last holidays with granddad are gone because the disk throws errors, and we urgently want them back.

Go back in time

You may know this: all my nice Firefox config and all these open tabs are gone as I played around with too many add-ons and messed things up and everything crashed and now the config is borked.

Or while working on this document/image/… I managed to delete important parts (the paragraph I thought I’d not longer need but which turned out to hold important information, the image layer I accidentally hit with the eraser tool and which has ugly holes now, …) without noticing. I want the old version back (at least to copy this parts into the new one).

So I would like to have access to a version of the config, the document, … how it was an hour or a day before.

It’s about my data

My main concern is about data I created or that is in other ways related to me: Pictures I shot, texts I wrote (including emails etc), or the ones I got from friends, my personal configs etc. My computer is my toolbox for most of my work, I produce data in all kinds of ways.

Loosing this could either mean that something valuable is gone that I cannot get back (like the pictures with grandma) or that I have to put in extra work to recreate it.

I am lazy

As doing the backup is work but does not reward me as long as everything is fine doing manual backups does not work too well, it works better if it is less effort.

I want doing my backups to be as simple as possible. I don’t want to have to think about, to keep track on what is where etc. It should be either fully automatic or at least one command that does what I want.

Strategy

So let’s have a nice, simple backup and recovery system that just works.

Our setup at home

Our home network consists of a home server that holds our data and has a media server (KODI) running (attached to the TV). For working, playing, audio books etc we use several laptops, smartphones etc.

All devices can access the home server through ssh/sftp (I did set up access via SAMBA and all the other stuff a NAS normally provides but we found out we simply don’t need them, and ssh with public/private key access is more comfortable anyway).

Then there is a dedicated backup server in another room, just a rather old tower that inherits the old HDDs from the home server if I update the latter, and is turned off by default. And finally I have access to external storage space via ssh.

All devices run Linux with btrfs partitioned in several subvolumes as file system, the home server and the backup server with btrfs RAID1 config. This means we have a COW filesystem with checksums and snapshots.

Snapshots

The laptop and the home server have snapper running. It automatically creates lightweight filesystem snapshots and cleans up older snapshots too. A single snapshot does not take much space as it basically only holds the differences and the corresponding metadata, and creating one takes no noticeable time.

On the laptops the home directories (subvolumes) are snapshotted every hour, the home server has a similar config for the media data. The system subvolumes get automatic snapshots on software updates. I took care that cache and temp directories are on separate subvolumes and therefore excluded from the snapshots.

The automatic cleanup keeps hourly snapshots for the last 10 hours, daily snapshots for the last 10 days, monthly for the last 10 month and yearly snapshots. This means I can access the two hours old version of a file as well as the one five days back etc. I don’t need this very often, but sometimes it saved my butt.

Sometimes (ca. once a year) I manually delete older snapshots to free some space, and when I accidentally put huge files in my home that I really don’t need any longer I manually remove it from the snapshots to get the space back. That is roughly the only housekeeping I need.

Version control

Similar to snapshots but not the same are version control systems (VCS). Most of my projects are checked in into git repositories with all the advantages of version control systems (and my git usage improved much since I started to use magit for this).

This works well for my program code, LaTeX, Markdown, (Inkscape) svg or even Scribus files, mut not so much for binaries, bitmap images or the Word, Excel or even Powerpoint files (or their Libreoffice/Freeoffice pedants) I am sometimes forced to work with.

Specific servers

One advantage of VCS is that I can access the repositories from different devices. The repositories that are not public or on the servers at work live on the home server. And as I also want to access my address book and calendars from different devices (laptop, smartphone, …) the home server also also holds address books and calendars (via radicale serving CalDAV and CardDAV) as well as an IMAP mail server to store the email archive. This means live copies of this data are mirrored to this server all the time.

Backup

Snapshots (and VCS) are nice to have and guard me against accidental changes or deletion, but they do not help against broken filesystems or hardware. This is what backups are for.

The home server runs on btrfs RAID 1 which means that every file and metadata is mirrored on two physical devices all the time, so this would survive the breakdown of one HDD. Nevertheless it would not help against filesystem or kernel bugs (or faulty RAM) that can shred the filesystem, and high voltage or a burning server can trash all storage devices at once. Same holds for ransomware or viruses that encrypt or delete my whole disk.

Laptop → home server

Our daily work is mainly on our laptops. The projects under version control are in the upstream repositories, mails, calendars etc are mirrored, but all the other files not. So the first line of backup is a small rsync script, that creates a new snapshot on the home server and then synchronizes all changed files from my laptop home directory to the home server. This also means that I have a backup history on the home server from all rsync backups run in the past. In other words: a new backup does not overwrite an old one.

Rsync has to go through all files to check for changes but is rather efficient by only transferring the changed parts.

Home server → backup server

Every night the backup server awakes (you can set a wakeup time in the BIOS of most computers), runs an automatic security software update and then starts to pull all the changes from the home server via ssh based on btrfs send/receive. As this only transfers the differences and gets them directly from the filesystem itself it is much faster than rsync, no need to go through all the files at all, highly efficient. After a few minutes the backup server shuts down again.

Before shutting down it sends a mail listing the subvolumes it backed up, the runtime, and the space left on the backup device. If there were any problems the mail subject will start with “ERROR”. If the backup server did not wake up in the night the home server will send me a mail some hours later that the backup is missing.

Malware/Ransomware

One important design choice is that the backup server pulls the backup (tunneled through ssh with an command="my_backup_send_script" entry in the authorized_keys file for the backup public key) from the home server and not the other way around. As the home server is somehow exposed (through ssh and partly through the IMAPS, CalDAV, CardDAV servers) it could be target of some automated ransomware attack.

And if there is malicious software on the server it would also follow any connection to other servers and continue to delete or encrypt stuff there too. Turning the direction around makes it much harder to even access the backup server.

The assumption here is that I could be accidental target for an automated attack. I think I am not important enough that someone would try to target me specifically.

If some ransomware starts to encrypt my files on disk they change. This means the difference between yesterday and today is larger and the backup will run for hours. So this would be an indicator something is odd.

Home server → external with restiq

Having a backup on a different computer in another room is fine, but if the flat burns down or a thief steals all hardware it is gone. So I decided to do an extra backup to an external server using restiq. It has nearly all the features I was looking for:

  • client side encryption: this means the data is encrypted at my end and the result is send to the server
  • only sends the changes, this is especially important for huge backups over slow connections
  • works over ssh (as well as several other protocols), there is no additional software needed on the server, I only need restiq on the client
  • stores snapshot-like versions on top of each other
  • I can restore single files if I need to, I can even mount a remote restiq repository locally and browse through the different versions of all the files
  • I can set several passphrases for the same repository

So I have several snapshots of my home server (and therefore all my data) encrypted on a remote server.

The external backup obviously is push, but guards against a different scenario.

But…

Old HDDs?

Currently my local backup server inherits all the HDDs from other sources, e.g. from upgrading my home server. The disks have to spin up once every night to receive the data. I expect them to fail some night. But hopefully not all at the same time.

One failing disk would be covered by the btrfs RAID, I could just add a newer one and would be fine. I already tried this procedure when updating the home server: added a new HDD, told the RAID to use it and to release the smaller one, when this was done same for the other disk.

So to loose data more than one disk of the backup server and more than one disk of the home server would need to break at the same time (and I would still have the data on the laptops and a maybe slightly older external backup).

btrfs

The btrfs filesystem had a bad reputation in the early development as unreliable as things would break. But it is very stable now for many years, I am using it across servers and do not really run into problems. Btrfs checksums allow me to at least recognize problems early.

There is a write-hole risk with RAID 5/6 if the system crashes at the wrong time, but I don’t use these RAID levels.

This text is too long

I agree. I should streamline things, maybe add some graphics. Will do for now.