I haven't written much yet about what I've been up to at work. Right now, I'm making changes to the sources of a set of Docker images. The changes I'm making should not result in any changes to the actual images: it's just a re-organisation of the way in which they are built.

I've been using the btrfs storage driver for Docker which makes comparing image filesystems very easy from the host machine, as all the image filesystems are subvolumes. I use a bash script like the following to make sure I haven't broken anything:

oldid="$1"; newid="$2";
id_in_canonical_form() {
    echo "$1" | grep -qE '^[a-f0-9]{64}$'
}
canonicalize_id() {
    docker inspect --format '{{ .Id }}' "$1"
}
id_in_canonical_form "$oldid" || oldid="$(canonicalize_id "$oldid")"
id_in_canonical_form "$newid" || newid="$(canonicalize_id "$newid")"
cd "/var/lib/docker/btrfs/subvolumes"
sumpath() {
    cd "$1" && find . -printf "%M %4U %4G %16s %h/%f\n" | sort
}
diff -ruN "$oldid" "newid"
diff -u <(sumpath "$oldid") <(sumpath "$newid")

Using -printf means I can ignore changes in the timestamps on files which is something I am not interested in.

If it is available in your environment, Lars Wirzenius' tool Summain generates manifests that include a file checksum and could be very useful for this use-case.


Comments

comment 1
Sounds like a job for diffoscope...
Comment by lamby,