jmtd → log → mount-on-demand backups

Last week, someone posted a request for help on the popular Server Fault Q&A site: they had apparently accidentally deleted their entire web hosting business, and all their backups. The post (now itself deleted) was a reasonably obvious fake, but mainstream media reported on it anyway, and then life imitated art and 123-reg went and did actually delete all their hosted VMs, and their backups.

I was chatting to some friends from $job-2 and we had a brief smug moment that we had never done anything this bad, before moving on to incredulity that we had never done anything this bad in the 5 years or so we were running the University web servers. Some time later I realised that my personal backups were at risk from something like this because I have a permanently mounted /backup partition on my home NAS. I decided to fix it.

I already use Systemd to manage mounting the /backup partition (via a backup.mount file) and its dependencies. I'll skip the finer details of that for now.

I planned to define some new Systemd units for each backup job which was previously scheduled via Cron in order that I could mark them as depending on the /backup mount. I needed to adjust that mount definition by adding StopWhenUnneeded=true. This ensures that /backup will be unmounted when it is not in use by another job, and not at risk of a stray rm -rf.

The backup jobs are all simple shell scripts that convert quite easily into services. An example:

backup-home.service:

[Unit]
Requires=backup.mount
After=backup.mount

[Service]
User=backupuser
Group=backupuser
ExecStart=/home/backupuser/bin/phobos-backup-home

To schedule this, I also need to create a timer:

backup-home.timer:

[Timer]
OnCalendar=*-*-* 04:01:00

[Install]
WantedBy=timers.target

To enable the timer, you have to both enable and start it:

systemctl enable backup-home.timer
systemctl start backup-home.timer

I created service and timer units for each of my cron jobs.

The other big difference to driving these from Cron is that by default I won't get any emails if the jobs generate output - in particular, if they fail. I definitely do want mail if things fail. The Arch Wiki has an interesting proposed solution to this which I took a look at. It's a bit clunky, and my initial experiments with a derivation from this (using mail(1) not sendmail(1)) have not yet generated any mail.

Pros and Cons

The Systemd timespec is more intuitive than Cron's. It's a shame you need a minimum of three more lines of boilerplate for the simplest of timers. I think WantedBy=timers.target should probably be an implicit default for all .timer type units. Here I think clarity suffers in the name of consistency.

With timers, start doesn't kick-off the job, it really means "enable" in the context of timers, which is clumsy considering the existing enable verb, which seems almost superfluous, but is necessary for consistency, ~~since Systemd units need to be enabled before they can be started~~ As Simon points out in the comments, this is not true. Rather, "enable" is needed for the timer to be active upon subsequent boots, but won't enable it in the current boot. "Start" will enable it for the current boot, but not for subsequent ones.

Since I need a .service and a .unit file for each active line in my crontab, that's a lot of small files (twice as many as the number of jobs being defined) and they're all stored in system-wide folder because of the dependency on the necessarily system-level units defining the mount.

It's easy to forget the After= line for the backup services. On the one hand, it's a shame that After= doesn't imply Require=, so you don't need both; or alternatively there was a convenience option that did both. On the other hand, there are already too many Systemd options and adding more conjoined ones would just make it even more complicated.

It's a shame I couldn't use user-level units to achieve this, but they could not depend on the system-level ones, nor activate /backup. This is a sensible default, since you don't want any user to be able to start any service on-demand, but some way of enabling it for these situations would be good. I ruled out systemd.automount because a stray rm -rf would trigger the mount which defeats the whole exercise. Apparently this might be something you solve with Polkit, as the Arch Wiki explains, which looks like it has XML disease.

I need to get mail-on-error working reliably.