Here's the mail archiving script I was talking about: http://github.com/jmtd/detachment. Sorry about the name-pun.

There's still a LOT of work to do, but it's useful enough to deploy, with modifications, now. If you set up a way of pushing stuff in a local directory to a remote webserver (ideally behind an SSL certificate and with access control configured) and you have a resolvable hostname for that site, then you can get at your files from most mail/web client combinations, including smartphones.

Here's some stats from running it on a sample mailbox:

Before

$ du -sh ncl/archive/2010
116M    ncl/archive/2010

After

$ du -sh ncl/archive/2010 bucket
41M     ncl/archive/2010
51M     bucket

Or, approximately 20% smaller. (That's the on-disk usage for the mail and the attachments, now not base64-encoded, and saved to disk).

The above folder was a local Maildir copy of a folder on an Exchange mail system, synchronised via offlineimap. Here are the sizes reported by Exchange for the same folder:

  • Before: 83991KB (82M)
  • After: 25435KB (25M)

Or, approximately 70% smaller. (That's the mails without the attachments counted, which are not in the Exchange server any more).

Lots of caveats:

  • If you are using offlineimap, you will need to do more than just run this script, or offlineimap may not notice that the messages have changed. Stripping out the U=\d+,FMD5=[0-9a-f]{32} component of filenames seemed to do the trick.
  • If you are using Exchange, a copy of the original messages will be retained in your Deleted Items folder. Beware if you are close to your mail quota.
  • If you are using Exchange via IMAP, then syncing the results of such an operation might be very slow. The above operation took well over an hour to sync, and I'm only a few hops away from the Exchange server via 100M links.

Comments

comment 1
Thanks a lot for releasing this script; I gave it a test on a trash mailbox or two and as you suggested it really did save a lot of space.
Steve Kemp,