jmtd → log → procmail versus exim filters
I’ve been using Procmail to filter mail for a long time. Reading Antoine’s blog post procmail considered harmful, I felt motivated (and shamed) into migrating to something else. Luckily, Enrico's shared a detailed roadmap for moving to Sieve, in particular Dovecot's Sieve implementation (which provides "pipe" and "filter" extensions).
My MTA is Exim, and for my first foray into this, I didn't want to change that1. Exim provides two filtering languages for users: an implementation of Sieve, and its own filter language.
Requirements
A good first step is to look at what I'm using Procmail for:
I invoke external mail filters: processes which read the mail and emit a possibly altered mail (headers added, etc.). In particular, crm114 (which has worked remarkably well for me) to classify mail as spam or not, and dsafilter, to mark up Debian Security Advisories
I file messages into different folders depending on the outcome of the above filters
I drop mail ("killfile") some sender addresses (persistent pests on mailing lists); and mails containing certain hosts in the
Referencesheader (as an imperfect way of dropping mailing list threads which are replies to someone I've killfiled); and mail encoded in a character set for a language I can't read (Russian, Korean, etc.), and several other simple static rulesI move mailing list mail into folders, semi-automatically (see list filtering)
I strip "tagged" subjects for some mailing lists: i.e., incoming mail has subjects like "[cs-historic-committee] help moving several tons of IBM360", and I don't want the "[cs-historic-committee]" bit.
I file a copy of some messages, the name of which is partly derived from the current calendar year
Exim Filters
I want to continue to do (1), which rules out Exim's implementation of Sieve,
which does not support invoking external programs. Exim's own filter language
has a pipe function that might do what I need, so let's look at how to
achieve the above with Exim Filters.
autolists
Here's an autolist recipe for Debian's mailing lists, in Exim filter language. Contrast with the Procmail in list filtering:
if $header_list-id matches "(debian.*)\.lists\.debian\.org"
then
save Maildir/l/$1/
finish
endif
Hands down, the exim filter is nicer (although some of the rules on escape characters in exim filters, not demonstrated here, are byzantine).
killfile
An ideal chunk of configuration for kill-filing a list of addresses is light on boiler plate, and easy to add more addresses to in the future. This is the best I could come up with:
if foranyaddress "someone@example.org,\
another@example.net,\
especially-bad.example.com,\
"
($reply_address contains $thisaddress
or $header_references contains $thisaddress)
then finish endif
I won't bother sharing the equivalent Procmail but it's pretty comparable: the exim filter is no great improvement.
It would be lovely if the list of addresses could be stored elsewhere, such as a simple text file, one line per address, or even a database. Exim's own configuration language (distinct from this filter language) has some nice mechanisms for reading lists of things like addresses from files or databases. Sadly it seems the filter language lacks anything similar.
external filters
With Procmail, I pass the mail to an external program, and then read the output of that program back, as the new content of the mail, which continues to be filtered: subsequent filter rules inspect the headers to see what the outcome of the filter was (is it spam?) and to decide what to do accordingly. Crucially, we also check the return status of the filter, to handle the case when it fails.
With Exim filters, we can use pipe to invoke an external program:
pipe "$home/mail/mailreaver.crm -u $home/mail/"
However, this is not a filter: the mail is sent to the external program, and the exim filter's job is complete. We can't write further filter rules to continue to process the mail: the external program would have to do that; and we have no way of handling errors.
Here's Exim's documentation on what happens when the external command fails:
Most non-zero codes are treated by Exim as indicating a failure of the pipe. This is treated as a delivery failure, causing the message to be returned to its sender.
That is definitely not what I want: if the filter broke (even temporarily), Exim would seemingly generate a bounce to the sender address, which could be anything, and I wouldn't have a copy of the message.
The documentation goes on to say that some shell return codes (defaulting to 73 and 75) cause Exim to treat it as a temporary error, spool the mail and retry later on. That's a much better behaviour for my use-case. Having said that, on the rare occasions I've broken the filter, the thing which made me notice most quickly was spam hitting my inbox, which my Procmail recipe achieves.
removing subject tagging
Here, Exim's filter language gets unstuck. There is no way to add or alter
headers for a message in a user filter. Exim uses the same filter language
for system-wide message filtering,
and in that context, it has some extra functions: headers add <string>,
headers remove <string>, but (for reasons I don't know) these are not
available for user filters.
copy mail to archive folder
I can't see a way to derive a folder name from the calendar year.
next steps
Exim Sieve implementation and its filter language are ruled out as Procmail replacements because they can't do at least two of the things I need to do.
However, based on Enrico's write-up, it looks like Dovecot's Sieve implementation probably can. I was also recommended maildrop, which I might look at if Dovecot Sieve doesn't pan out.
- I should revisit this requirement because I could probably reconfigure exim to run my spam classifier at the system level, obviating the need to do it in a user filter, and also raising the opportunity to do smtp-time rejection based on the outcome↩
Comments
Thanks to fanf who pointed out I can use string expansions to look things up from external files, so for the killfile, this works:
As well as a neat trick for extracting the current year from one of the pre-populated variables, e.g.
To derive a folder name from the calendar year, you could use the following:
I do not know which items from the
requirelist are minimal needed for this to work.