jmtd → log → mail archiving
I wrote a quick proof-of-concept script that can traverse my email (in local Maildir format) and detach all attachments: saving them to another location and replacing them with a text attachment which documents where they've gone.
This reduces my work mailbox size substantially and allows for more space-efficient storage of the attachments (utilising all 8 bits of the byte, at the very least, but with the potential for compression to be added to the mix). It maintains the link between email and attachment, however, which is crucial if the method of discovery of the attachment in the future is to find the email.
For this to be future proof, the path where the mail is saved needs to be. The best scheme I can think of involves dedicating a sub-domain to the problem and using a URI scheme which is guaranteed1 to be unique for the attachment, such as a hash-sum:
https://file.example.com/sha1/509c2fe2eba509e93987c3024a74d74583c274bd
I could finish up my script, set my my own personal infrastructure for this and convert my mailbox with a few hours more work. Before I do, I have two questions:
- Is this already available in an open source tool: am I re-inventing the wheel needlessly?
- Would anyone else find this useful?
- within the practical limitations of hashing algorithms, at least.↩
Comments
Actually, magnet links seem to be most suitable for this: http://en.wikipedia.org/wiki/Magnet_URI_scheme
I think it would be useful to myself, and others.
Although it isn't directly analogous I suspect there are things already that do that - the most obvious thing that springs to mind is those systems which parse and insert mails into databases, for example:
http://www.dbmail.org/
For more human-readable filenames, you could use something like:
https://file.example.com/$MESSAGE_ID/$ATTACHMENT_NUMBER-$FILENAME
The Message-ID is already "nearly unique", and in the unlikely event of a collision just check for the existence of the $MESSAGE_ID subdirectory and add a suffix "-01" (or "-02", "-03", ...) to $MESSAGE_ID
and you could make $ATTACHMENT_NUMBER zero-padded 2-digits for sorting.