jmtd → log → Borg corrupted hints file
I've been using Borg backup for a couple of years and it has seemingly worked very well for me. One difference I really appreciate from my previous arrangement (rdiff-backup) is the freedom to move large files or file hierarchies around (including between different filesystems) without provoking large backup incrementals.
About a week ago I had my first real problem with Borg: backups started to fail with the following complaints:
Creating archive at "/backup/borg::{hostname}-home-jon-{now:%Y-%m-%dT%H:%M:%S.%f}"
segment 61916 not found, but listed in compaction data
[ further, similar lines ]
Local Exception
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4690, in main
exit_code = archiver.run(args)
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4622, in run
return set_ec(func(args))
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 177, in wrapper
return method(self, args, repository=repository, **kwargs)
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 595, in do_create
create_inner(archive, cache)
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 560, in create_inner
archive.save(comment=args.comment, timestamp=args.timestamp)
File "/usr/lib/python3/dist-packages/borg/archive.py", line 530, in save
self.repository.commit()
File "/usr/lib/python3/dist-packages/borg/repository.py", line 475, in commit
self.compact_segments()
File "/usr/lib/python3/dist-packages/borg/repository.py", line 835, in compact_segments
assert segments[segment] == 0, 'Corrupted segment reference count - corrupted index or hints'
AssertionError: Corrupted segment reference count - corrupted index or hints
At about the same time I had managed to fill the backup host's root filesystem.
I thought the two issues must be related. Although all the files Borg is
backing up, and the backup repository it writes to, are located on different
partitions, Borg's client-side of things does maintain some caching in
/root/.cache/borg
. My first idea was that this must have been corrupted by
an aborted write, but zapping it did not cure the above problem.
It occurred to me that I run Borg via the convenience wrapper Borgmatic, and it was possible that was failing, but after a short investigation I ruled that out.
Various attempts at running borg check
or borg check --repair
didn't help
either. The underlying filesystem (XFS) passed a filesystem check. There wasn't
any obvious complaints about IO errors from the kernel or anything reported in
the HDD's SMART data.
What did work, in the end, was removing the file matching $BORG_REPO/*hint*
and trying again. Although this is read/written to on the backup partition, it
seems filling the root partition caused Borg (1.1.16-3) to corrupt that file.
Everything seems fine following that. I have recently started trying to semi- automatically verify backups on a monthly basis, on a machine independent from the NAS; all the tests I have written so far passed.
Comments