I’ve been involved professionally with technology since 1980. So, you’d think that I understand it (and how it works) reasonably well by now. On some levels, sure, but on others, I feel as helpless as the proverbial mother-in-law or grandparent in the “clueless users” examples people always give…
Conceptually, I understand how “small tweaks” can lead to large unexpected results. It’s a variation on chaos theory. Practically, it’s still annoying. What is harder (for non-techies) to understand is when things break down after no changes (that they are aware of!). Of course, it’s the parenthentical comment that is the clue.
With modern operating systems, the vast majority of users have some form of automatic updates turned on. That being the case, things are chaging frequently, and possibly in very significant ways. It just so happens that the user doesn’t associate different behavior in their favorite applications with an invisble update.
The above was just generic whining to get to one or two rants that have been bugging the hell out of me lately…
The first topic is spam filtering. For many reasons (most of them rational ;-)), I am Windows user (specifically, WinXP Pro, but that’s not important). I don’t think it’s superior, etc., but many applications that I find convenient (and in some rare cases even necessary) are always available first on Windows, and often only on Windows… C’est la vie…
So, being a comitted Windows person (no, the irony of that statement doesn’t escape me ;-)), for many years, I was a tried and true Outlook user. In fact, I started with Outlook 97, moved to 98, then 2000, and then 2003 (no, I didn’t have the pleasure of Outlook XP).
In the early years, there was no need for spam filtering. Not only was the volume of spam low, my Internet activities were reasonably limited, so I wasn’t on many spam lists anyway. Of course, being a VC now, and having my name on many public sites, along with being subscribed to many mailing lists (public as well as publically available internal company lists), has changed that fact melodramatically.
On some days, I get well over 1000 spam messages (through the variety of means that email can wind up in my real account). Clearly, that isn’t a sustainable number of mails to have to delete by hand (even though I am ultra fast at spotting spam and hitting the Junk key).
So, a few years ago, I installed the free SpamBayes plug-in for Outlook. (This now requires a minor side-rant) 🙁
<Side Rant>
Ever since I upgraded to WordPress 2.1, I can’t create any links with their “visual” tab. I wanted to link to the SpamBayes project page above, and got a blank pop-up box where the form is supposed to be. Firebug shows errors with TinyMCE, and before that, an error with an XHTPPRequest, so it’s likely Firefox config that’s causing the problem, but I have no idea whatsoever what else to try (obviously, I’ve tried a lot of things…)
</Side Rant>
So, I ran SpamBayes for a long while, and also ran a commercial derivative of it, InBoxer (should have had a link to that as well…)
It did a pretty good job. Still, it wasn’t all that satisfying, because every message needed to be downloaded to my laptop, before SpamBayes (SB) could analyze it. That meant that on a heavy spam day, if I was on a slow link (let’s say dial-up, gulp), I had to wait for all of the spam to come down to find the few gems that I was breathlessly waiting to read.
So, after doing that for quite a while, and building up a large SB db, I decided to get creative. I installed SB on the server as well (I control my own server), and regularly uploaded my local (meaning laptop) SB db to the server. Then I added a procmail rule that filtered each message using the locally trained db (but now up on the server), and then did one of three things with the result:
- If it was marked as “ham” (definitely not spam), it was just passed through normally.
- If it was marked as “unsure” (the range is user-definable), then it was moved to another account on the server, so that it didn’t auto-download on each email check (this solved the problem of slow links with lots of possible spam)
- If it was marked as “spam”, it was deleted right then and there on the server.
This worked very nicely for quite a while as well.
Then, I woke up, and decided to break myself of the Outlook Addiction. I’m still firmly in the Windows world, and have been ever since I decided to stop using Outlook for email (over 2 years ago now!). Even though I own a legal copy of Office 2003, I now only use Outlook for Calendar, Tasks and Notes, and that only because it syncs reliably with my Treo 700p.
I switched to Thunderbird, and have never regretted doing that. I’ll save any niggling complaints about Thunderbird for some future post when I am really bored, since for the most part, I am extremely happy with TB.
Now, the first part of the problem. TB has built-in Junk filters, which work OK (but not that great), but that puts me back to having to download everything to have it analyzed. The second part of the problem is that I can continue to use the old (static) SB db on the server to help cut down on spam, but the real beauty of SB is the B (Bayes), which continually learns. Since spammers constantly change their strategies to stay ahead of the anti-spam companies, having an outdated SB db degrades its usefulness over time.
Wow, I can’t believe how much background I just gave in order to get to the actual point…
Recently, emails that were previously being marked as “ham”, or “unsure”, were getting tagged as guaranteed “spam”, meaning SB was assigning them a spam score of 1.0! Of course, my server-side filter was dutifully tossing them to /dev/null as instructed, and I was blissfully unaware of that.
I discovered that when another phenomenon began. Any emails with large attachments were going directly to /dev/null. Since most of my procmail rules are also duplicated for Lois, she was complaining before I noticed, that people were writing tons of “follow up” emails to her, wondering why she hadn’t responded to their last email. Those follow up emails were getting through, because they didn’t have attachments. I am still not sure that this was because of the old SB db, but at least that caused me to find the other emails that were definitely being miscategorized…
In any event, I turned off the SB db, and the flood of spam started up again. About a month ago, I turned off SpamAssasin on the server side, because while it was somewhat effective, it was also one of the biggest resource hogs I had ever seen on the server, and the “reward” wasn’t worth it…
So, now, I’m spending a little too much time hand-tuning procmail rules to get the spam back down to a mangeable range. So far, so good, but with lots more effort than I would have hoped to expend, given the nice steady state I had for a reasonably long time.
Anyway, this post has turned out way longer than I expected, so I will save the other “random” events for some future post, when they bubble to the top of my frustration queue.
P.S. I am still not sure I’ve “solved” the large attachment problem. My temporary solution was to specifically whitelist those senders in procmail, which works, but begs the issue of whether others are being thrown away that I’ll never find out about, or find out about too late 🙁
Leave a Reply