Please Donate: The Leukemia & Lymphoma Society - Scenic Shore 150 Bike Tour [Donate Here]

Blocking Blog Spam

June 23rd, 2004

I have set up a couple of blacklist features on this MT blogger to prevent comment spammers from getting their junk onto this website. We have all been annoyed by spam to our email inbox for a long time now and due to the popularity and ease of posting it has become common for automated spamming of blogs. It is easy to find blog sites simply by going to the sites which accept pings for feed updates. So it only made sense that these evil spammers would find a way to exploit it. But it seems there is a powerful way to stop them...

Since I am using Movable Type I simply searched for MT plugins. A powerful and easy to set up plugin is Jay Allen's MT Blacklist. It simply provides a handler to review comment content for URLs which are exact matches to a list of URLs or a pattern match to likely spam comments.

MT-Blacklist

His site explains that spammers have to post a URL so that people will click over to them and also for search engines to increase their link popularity which causes their website to appear higher on the search results. So the MT-Blacklist uses the weakness of having a URL as the easily extracted match for the blacklist to prevent postings.

I set this up and it blocks things occasionally but the database or URLs really needs to be updated. When I get a comment I will get an email with a link to the MT-Blacklist interface to allow me to delete and scrape the comment to extract any URLs to be added to my blacklist. And as a responsible human being I would submit the addition back to Jay Allen's website to be checked and distributed to the community.

But the trick here is that the MT-Blacklist does not update itself from the master list automatically. Something had to be done to keep it updated. So I searched Google and found several updates written in Perl, PHP, Python and other languages. I tried the PHP version and had troubles, so I moved over to the Perl updater. It works by pulling the XML feed of updates from Jay Allen's website and either adding or removing entries from my database or URLs.

I am using the Perl script from Toby Simmons plainly called MT Blacklist Updater in Perl. You have to update it with your username and password to get into your blog as it uses the MT-Blacklist interface to maintain the blacklist. I set it up and tested it but had some initial problems.

Apparently the master blacklist has a redirect in place and it was causing this perl script to have trouble. So I bypassed that with a shell script.

#!/bin/sh

cd /web/servers/blog.offwhite.net/html
fetch http://www.jayallen.org/comment_spam/feeds/blacklist-changes.rdf
chmod 644 blacklist-changes.rdf
perl /home/brennan/bin/mt-blacklist-update.pl

What this does is simply use the fetch utility found in most Unix systems to pull down the master blacklist and place it onto the blog site hosting the MT interface. Then without the funny redirect the perl script was able to read the file. Then all I had to do was schedule it to run every couple of hours as instructed on the associated website. And today as I look at the MT Activity Log I see that early this morning there were several new additions to my blacklist. Ever since getting this in place I am finding no spam is getting through. As of now, I win!

Comments are closed.