I've been using Borg backup for a couple of years and it has seemingly worked very well for me. One difference I really appreciate from my previous arrangement (rdiff-backup) is the freedom to move large files or file hierarchies around (including between different filesystems) without provoking large backup incrementals.

About a week ago I had my first real problem with Borg: backups started to fail with the following complaints:

Creating archive at "/backup/borg::{hostname}-home-jon-{now:%Y-%m-%dT%H:%M:%S.%f}"
segment 61916 not found, but listed in compaction data
[ further, similar lines ]
Local Exception
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4690, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4622, in run
    return set_ec(func(args))
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 177, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 595, in do_create
    create_inner(archive, cache)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 560, in create_inner
    archive.save(comment=args.comment, timestamp=args.timestamp)
  File "/usr/lib/python3/dist-packages/borg/archive.py", line 530, in save
    self.repository.commit()
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 475, in commit
    self.compact_segments()
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 835, in compact_segments
    assert segments[segment] == 0, 'Corrupted segment reference count - corrupted index or hints'
AssertionError: Corrupted segment reference count - corrupted index or hints

At about the same time I had managed to fill the backup host's root filesystem. I thought the two issues must be related. Although all the files Borg is backing up, and the backup repository it writes to, are located on different partitions, Borg's client-side of things does maintain some caching in /root/.cache/borg. My first idea was that this must have been corrupted by an aborted write, but zapping it did not cure the above problem.

It occurred to me that I run Borg via the convenience wrapper Borgmatic, and it was possible that was failing, but after a short investigation I ruled that out.

Various attempts at running borg check or borg check --repair didn't help either. The underlying filesystem (XFS) passed a filesystem check. There wasn't any obvious complaints about IO errors from the kernel or anything reported in the HDD's SMART data.

What did work, in the end, was removing the file matching $BORG_REPO/*hint* and trying again. Although this is read/written to on the backup partition, it seems filling the root partition caused Borg (1.1.16-3) to corrupt that file.

Everything seems fine following that. I have recently started trying to semi- automatically verify backups on a monthly basis, on a machine independent from the NAS; all the tests I have written so far passed.


Comments

comment 1
this seems like a pretty critical bug that should never happen. have you considered reporting this upstream?
Comment by anarcat,
comment 2
I have considered it, and I probably should. I'm often a little reluctant until I've established whether the problem might be given short shrift for being in an old (stable) version. It seems I'm running 1.1.16, and that's their "oldstable" branch, which was at least updated this year. There's a 1.2 (stable) branch and now a 2.0 beta. I'll probably report it.
jon,