Coolermaster case on the right

Coolermaster case on the right

Φόβος (phobos) is the name of my NAS, backup and home server. The current hardware setup is described at phobos; this page aims to describe the rationale, setup, software and usage, mostly of backup.

Software

Debian GNU/Linux for the operating system.

I don't use RAID (which is not backup).

Remote decryption

I use full-disk encryption which necessitates supplying a passphrase when the machine is booted. Since this is a headless box, some additional work is needed to permit supplying this passphrase over the network. Luckily most of the work is done already by installing the dropbear-initramfs Debian package and reconfiguring keys and authorized_keys files in /etc/initramfs-tools afterwards. This means I can SSH into the pre-boot environment. From there, I just need to run cryptroot-unlock and supply the decryption passphrase.

Notifications

LED

blinkenlights!

blinkenlights!

I wanted to add a means of notifying me of events on the machine. I bought a Blinkstick Nano, a tiny USB stick with an LED on each side. I've hooked calls to change the light colour into the success/failure paths for the systemd jobs that drive my backups. Further details here: 1, 2, 3.

The light defaults to off. When an interactive job is in progress, it turns on and blue. When the job completes, the light changes to either green or red depending on success or failure. Green means I am safe to remove the drive, in the case of external drives.

When a non-interactive, scheduled job fails, the light turns red. I usually notice this next morning.

Email

Following some instructions on the Arch wiki, I configured Systemd to mail me if a job fails. A generator service is referenced via OnFailure=status-email-user@%n.service. The generator:

[Unit]
Description=status email for %I

[Service]
Type=oneshot
ExecStart=-/usr/local/bin/systemd-email %i
ExecStart=/usr/local/bin/blinkstick --index 1 --limit 50 --set-color red
Group=systemd-journal
DynamicUser=yes

where /usr/local/bin/systemd-email is simply

systemctl status --full "$1" | mail -s "unit $1 failed" root

Backup software

I backup remote hosts to locations on the NAS via rsync.

I backup local files (including the backups of remote hosts) using Borg (v1.4).

Each month I backup the local Borg repository alternately to one of a pair of external drives (again using rsync) which are kept out of my house.

Remote hosts

Borg works using a push model: a backup client needs to have permission to connect to the Borg backup server, and self-initiates its backups. I prefer to pull from my backup server, and have the clients trust it — less chance that an external host is compromised giving an attacker access to my NAS. The schemes for achieving this in Borg (ssh tunnels) are awkward, so instead I do things in two phases: a simple server-initiated rsync of clients to a holding space in the NAS, followed by local-only Borg backups of the holding spaces.

TODO: At the moment, nothing except scheduling prevents the rsync jobs from running at the same time as the borg ones. I should add the necessary exclusions (After=, I think)

Borg / Borgmatic

The main backups are via Borg, but I use borgmatic as a convenience wrapper and that's what I schedule (via systemd timer) to fire at 3am nightly. The script calls borgmatic --prune --create, which has several different backups configured (currently 9). These run serially.

I deliberately omit --check from borgmatic or it would run a borg check for each of these nine configurations. Instead I run borg check separately, currently scheduled once a week.

It typically takes 1h15m for the backups to run and 5h30m for the check stage to run.

backup mount namespace

To protect against an accidental (or malicious) process corrupting the backup filesystem, I want to limit access to them to only the processes that need to access them.

My first stab at solving this was mount on demand backups. This reduced the risk to the time that the backup jobs were running, but didn't eliminate it.

I've since moved to creating a persistent mount namespace for the purpose. For now the best description of this is at mount namespaces.

Third-strand external

Every month I backup the Borg backup repository to one of two external hard drives that live off-site the rest of the time. This requires an exclusive lock for the Borg database so I cannot do it if the Borg backups or check jobs are running.

In recent times this has taken anything between 2h20m and 5h.

preparing the drive

This is mostly just a standard dm-crypt/cryptsetup/LUKS encrypted device, with a normal filesystem sitting on top: Basically, the most common way to encrypt a drive in Linux. See places like the cryptsetup docs for how to set something like that up. The key things here are

  1. set up a decryption key file as well as (or instead of) a pass- phrase and store that somewhere on the filesystem of the NAS
  2. back up the LUKS header, as the cryptsetup documentation stresses you should
  3. make a note of the underlying drive UUID: it's needed for the WantedBy line in the backup service file. (look in /dev/disk/by-uuid before and after inserting it and see what was added)
  4. label the filesystem on top of the encrypted device extbackup
  5. set up a /etc/crypttab line with all the info needed to decrypt, but mark it noauto
  6. set up a /etc/fstab line with all the info needed to mount, keying off the label, but mark it noauto
How

As much as possible I lean on systemd and its ability to trigger actions based on events. Here's what happens when one of my external backup drives is plugged in:

  1. systemd instantiates a corresponding device unit, named dev-disk-by\x2duuid-$UUID.device, where $UUID is the UUID of the device, with the dashes substituted for \x2d, as systemd requires.

  2. The backup job is a systemd service which has a WantedBy relationship on the device unit, so when the device appears, systemd starts the backup service.

  3. The backup service has Requires and After relationships on systemd-cryptsetup@extbackup.service, a service originally created by systemd's cryptsetup generator, but I altered it to add StopWhenUnneeded=true1. The encrypted device is therefore unlocked (but not mounted)

  4. The backup service looks has ExecStartPre=/home/jon/bin/mkbackupns, which creates (if necessary) the backup mount namespace.

  5. The backup service then executes the main backup script (described in the next section), wrapped in /usr/bin/nsenter run inside the backup mount namespace.

Once the backup script has finished:

  1. systemd deactivates the backup-exthdd.service,

  2. and, since it is not wanted by an active service it also stops the systemd-cryptsetup service, locking the device.

The backup script

The backup script itself is not complex:

#!/bin/bash
set -euo pipefail

echo starting phobos-backup-monthly "($*)"
blinkstick --index 1 --limit 10 --set-color yellow # 'working'

mount /extbackup 

borg with-lock /backup/borg rsync -a --delete \
    --max-alloc=4GiB \
    /backup/ /extbackup/

umount /extbackup

echo phobos-backup-monthly "($*)" finished
blinkstick --index 1 --limit 10 --set-color green

In stages:

  1. set the blinkstick to the working colour (yellow)
  2. mounts the filesystem from the external drive. (I can't rely on systemd's .mount service type to do this.)
  3. call rsync to sync the internal backups to the external...
  4. ...wrapped in borg with-lock to ensure we are mutually exclusive with any borg jobs.
  5. unmounts the filesystem
  6. sets the blinkstick to the success colour (green)

  7. I notice the LED colour is green, remove the drive, and take it to its off-site home.

If anything goes wrong, all my custom systemd units have, as a matter of course,

OnFailure=status-email-user@%n.service blinkstick-fail.service

Schedule

I keep forgetting when the backup jobs typically are running (which excludes doing other things at the same time), so here's a rough sketch.

Time Borg Rsync External (example)
0000
0100
0200
0300 borgmatic backups starts
0400 backups…
0415 borgmatic backups finishes
0500
0600 backup-luv starts & finishes
0700 weekly borg check starts
0800 check…
0900 check…
1000 check…
1100 check…
1200 weekly borg check finishes
1300
1400
1500 sync-nuhc starts & finishes
1600
1700
1800 ext starts
1900 running…
2000 backup-chew starts & finishes running…
2100 running…
2200 running…
2300 ext finishes

Footnotes


  1. I customized mine a while ago by copying the generated service file to a static file, but nowadays I think you could do systemd edit systemd-cryptsetup@extbackup.service to add the StopWhenUnneeded to an override file and not need the rest.