I wrote a quick proof-of-concept script that can traverse my email (in local Maildir format) and detach all attachments: saving them to another location and replacing them with a text attachment which documents where they've gone.

This reduces my work mailbox size substantially and allows for more space-efficient storage of the attachments (utilising all 8 bits of the byte, at the very least, but with the potential for compression to be added to the mix). It maintains the link between email and attachment, however, which is crucial if the method of discovery of the attachment in the future is to find the email.

For this to be future proof, the path where the mail is saved needs to be. The best scheme I can think of involves dedicating a sub-domain to the problem and using a URI scheme which is guaranteed1 to be unique for the attachment, such as a hash-sum:

https://file.example.com/sha1/509c2fe2eba509e93987c3024a74d74583c274bd

I could finish up my script, set my my own personal infrastructure for this and convert my mailbox with a few hours more work. Before I do, I have two questions:

  • Is this already available in an open source tool: am I re-inventing the wheel needlessly?
  • Would anyone else find this useful?

  1. within the practical limitations of hashing algorithms, at least.