I’ve been using procmail to filter mail on the server forever. I like it, so it’s important to note that even though I switched, I have nothing bad to say about procmail.
So, why did I switch? Procmail can be a little terse to read (obviously, I’m used to it by now). Over the years, I have built a large set of rules. There is a ton of cruft in there. If I wanted to clean it up, I had to rewrite it. Rewriting it in procmail was definitely a possibility.
But, over the years, I was also aware of maildrop as a filtering solution. It has a cleaner (more accurately, a more straightforward) syntax. The documentation is a little sparse (missing a few key examples IMHO). There are also thousands (if not millions!) of example lines of procmail available on the net, and it can be hard to find complex real-world examples of maildrop filters.
But, I knew that if I rebuilt my filters in maildrop, I’d be forced to rethink everything, since I couldn’t get lazy and just grab hunks of procmail from my current system and plop them into the new one. So, maildrop it was going to be!
One last time, just to make sure I don’t offend lovers of procmail (of which I am one!), everything that I did in maildrop could easily have been done in procmail. I just happened to choose maildrop for this rewrite, and for now, will stick with it. Perhaps if I ever revisit this project, I’ll iterate the next time back in procmail.
The goal of my filters is to toss obvious spam (send it to /dev/null). Likely spam gets sorted into one of three IMAP folders. The only reason I split it into multiple folders is so that I can test rules before turning them into full-blow deletes. Finally, mail that falls through those is delivered to my inbox.
Over the years, I added numerous rules to filter classes of spam (stocks, watches, viagra, insurance, etc.). Without a doubt, I introduced tons of redundancy. I didn’t scan all of the previous rules to see where I might be able to add one more line because it would be too tedious vs just adding a new rule.
I was reasonably satisfied with the result, but over time, became less aggressive about deleting mail automatically, preferring to stuff it in a spam folder for visual scanning during the day.
Since I create my own rules (I don’t run a system like SpamAssassin, which I did for a while), I can start to see patterns and simplifications over time, which was the impetus for the rewrite. In other words, there are more commonalities across classes of spam, and I don’t have to spend as much time categorizing things as I was bothering to do.
I’ve now made my first cut of the maildrop-based system. It’s been in production now for seven days, and I’m very happy with it so far. The one major change I made is to default to deleting things (in other words, much more aggressive than the previous system), but, I keep a copy of all mail in an archive IMAP folder that I will prune through a cron job, and never scan visually.
I review my delete logs once a day, so if I spot an email that looks like I shouldn’t have deleted it, or someone contacts me asking why I didn’t respond, I will be able to check the archive and have the full mail there (for some reasonable period of time).
Here’s the result of the rewrite:
The original procmail system had roughly 3800 lines it in (including comments and blank lines). The new maildrop system has under 550 lines, including comments and blanks. I delete more mail automatically, and in a week, haven’t deleted a single mail that I didn’t mean to. I am getting a few more emails sneaking into my inbox, but each day, I add a few more lines and the list gets shorter the next day.
Now that I am getting a bit more spam each day into my inbox, Thunderbird junk filters are getting more to train on, and they are getting better too, so even the junk that is getting in, is mostly getting filed in the Junk folder locally, automatically.
Here are two things that took me longer than they should have to figure out with maildrop (they are related, meaning the solution is identical in both cases, but it wasn’t obvious to me):
- How to negate a test using !
- How to use weighted scoring correctly (very simple in procmail)
Here’s a line in maildrop format:
if ($TESTVAR =~ /123/)
do something useful if true…
The above will “do something useful” if the variable TESTVAR contains the pattern 123. What if I want to “do something” if TESTVAR does not contain 123? Well, until I figured it out, I was making an empty block for “do something”, and adding an else for the thing I really wanted. Ugly.
My first attempt was to change the “=~” to “!~” (seemed obvious). Nope, syntax error. I then tried “if !($TESTVAR =~ /123/)”. Nope, syntax error. I then tried “if (!$TESTVAR =~ /123/)”. No syntax error, but it doesn’t do what I wanted.
I stumbled on the solution via trial and error:
if (!($TESTVAR =~ /123/))
Ugh. The ! can only be applied to an expression, which is normally (but not always?!?) enclosed in parens. But, the if itself requires an expression, so you need to put parens around the negated expression as well. At least I know now…
The second problem was weighted matches. I was having the same problem. Once I put parens around my expressions, it started working. That’s one of the few places where the procmail syntax feels a drop cleaner:
COUNT=(/123/:b,1)
COUNT=$COUNT+(/456/:b,1)
COUNT=$COUNT+(/789/:b,1)
echo $COUNT
So, the above sets the variable COUNT to the number of times that the string 123 exists in the body of the message. That is then added to the number of times that the string 456 exists in the body, finally adding the number of times that the string 789 exists in the body. The total is then echoed to the console. Without the parens, no workie.
I don’t like the fact that I have to maintain the running count myself. In procmail, you basically set a limit and the tests stop once the limit is reached (which feels way more efficient). There might be a way to accomplish that with maildrop too, but I haven’t found it as yet…
While I fully expect to add more rules, or lines to existing rules, I can’t imagine a scenario where my file will even double from here, so it will end up at less than 1000 lines. That will be easier to maintain for a number of reasons, most notably syntax readability.
Leave a Reply