Fighting Spam
March 12th, 2004I have been actively fighting spam lately. For a long time I have been just using the Apple Mail client which has a nice spam filter, but I still get a lot of mail. And since I run my own server with many users I find the amount of spam takes up a great deal of disc space. So I have been trying various techniques to block unwanted mail.
With a recent upgrade of my server to FreeBSD 4.9 I was able to further enhance Sendmail as it is a more current version. Now I have added multiple DSNBL hosts to check these remote blacklists in order to block mail. I have created a status page to show the daily progress with the blacklists. Initially I saw over 4000 messages blocked daily. Lately it has been under 1000. I can only speculate that the blocking has caused various spam lists to stop attempts to email accounts on my server. I can only hope.
But yet a good deal of mail was getting through. In addition to blacklists, it is also advisable to use a statistical filter. The Apple Mail and the Mozilla Thunderbird mail client both use a statistical filter which learns over time which mail is spam and what is not. So I found SpamAssassin, a statistical mail filter which employs many techniques to block unwanted mail, including a statistical filter which needs to learn over time. So far it has blocked a great deal of mail. I am unsure what amount could be false positives, but my test emails get through.
One aspect about SpamAssassin that I do not like is that I cannot maintain the filter easily. I have to copy mail files from my Mac over to the server and run a script to give SpamAssassin hints at what is wanted mail and what is spam. The learning script uses the terms ham and spam, how clever. When enough messages are reviewed, it can get a pretty good picture of what should be blocked. Hopefully that will not require a lot of maintenance, but I have read that it will need to be retrained occasionally. I also am concerned about false positives. At least when Apple Mail blocks an email it moves it out of my Inbox to the Junk mail folder. I can look in there if I am expect mail that did not seem to get through. For example, I recently emailed myself from work but it got marked as spam and I had to pull those emails out of the Junk folder. (One was a key for a free song on iTunes so I had to read it)
The alternative to SpamAssassin which is most appealing is DSpam. In the FAQ it claims that an untrained SpamAssassin may blocked 90% of incoming spam and will learn to block up to 95% of spam on average. Then it claims that DSpam will over 99% of spam with fewer false positives shortly after it is put in place. It explains that it uses a blend of different statistical filters to identify spam. It sounds appealing from a pure numbers standing point.
But then it explains that training the filter is done by simply forwarding unwanted mail to a special email address which will fine tune your spam filtering as opposed to uploading mail files to be scanned using a script that I have to run manually. For the average person using my server for mail, I find that to be a clear reason to use DSpam over SpamAssassin.
For now I will give SpamAssassin a while to do it's thing. In additional to blocking spam, it also blocks other types of mail, including viruses or email in unwanted languages. I often get email from Japan or Korea that I will never be able to read so I have set it only to accept English along with Spanish, French and German since those will be the most likely languages that my users and I will be using.
In a couple of weeks I should be able to gauge whether or not it will be worthwhile to try out DSpam.
