Having spammers hit your blog or forum? Why not block them by rules in htaccess to start with! Adding the spammers IP to your htaccess file keeps it from hitting your site to start with and keeping the website more accessible for the real users. With the help of a website and a small Python program I wrote that is exactly what I did.
Let me start with some back ground information.
Because my article on compile a custom kernel for Ubuntu was published on Linux Today my traffic increased. Besides the people really interested in this article it also attracts spammers, trying to get their “message” across. I was already running WP-Spamfree and that really keeps the comment section clean but every spammer still hits the index.php and that causes your CPU utilization to go up. I installed WP Super Cache as well but it still wasn’t enough so I decided to add the spammers IP to a deny list in my htaccess file.
The first thing I needed was a list of known spammer IP’s. I found a website that keeps track of Forum Spammers (http://www.stopforumspam.com) and offers a list in CSV format with known spammer IP’s. It also offers an API to directly check their on-line database.
I wrote a Python program that will do the following:
It downloads the CSV list from the above website.
It downloads the Apache log from my server. My provider offers logs, a monthly one and a current running daily one.
It download the htaccess file.
It then process the Apache log, checking if the IP hitting my site is a known spammer according to the CSV list. If it is it will add this IP address as a deny from in my htaccess file.
The program keeps a log file of what it does and at the end it will show a summary of what it did.
[2008/11/23 18:50:52] [INFO ] Starting update-htaccess.py [2008/11/23 18:50:52] [INFO ] Retrieving Banned IP list from stopforumspam.com [2008/11/23 18:50:52] [INFO ] Banned IP list retrieved [2008/11/23 18:50:53] [INFO ] Retrieving Apache log blog.avirtualhome.com-Nov-2008.gz [2008/11/23 18:50:59] [INFO ] Apache logfile blog.avirtualhome.com-Nov-2008.gz retrieved [2008/11/23 18:50:59] [INFO ] Retrieving .htaccess file [2008/11/23 18:51:00] [INFO ] .htaccess file retrieved [2008/11/23 18:51:00] [INFO ] Processing Apache logfile [2008/11/23 18:51:15] [INFO ] Apache logfile processed [2008/11/23 18:51:15] [INFO ] Starting update .htaccess file [2008/11/23 18:51:19] [INFO ] Update .htaccess file completed [2008/11/23 18:51:19] [INFO ] Total IP scanned: 6049 - Clean: 5914 - Spam: 135 [2008/11/23 18:51:19] [INFO ] Total Banned: 135 - New: 5 - Known: 130 - Removed: 0 [2008/11/23 18:51:19] [INFO ] update-htaccess.py finished
I still have to manually upload my htaccess file back to my site but this a deliberate decision I made. Should something go wrong with the check or update of the htaccess file it won’t mess up my site if I upload the file back automatically.
The program can work for other providers but there might have to be some minor tweaks to the program itself. Most of the program is configurable through an INI file, things like username, password, directories for the FTP retrieval of the htaccess and Apache logs can all be customized.
You can use the program for any website of course, it’s not limited for a WordPress site, I use the same program with different settings for my phpBB forum as well.
The program has the following requirements:
OS: Linux (I use a system command to copy a file)
Python 2.5 (of course)
The Python modules: logging, urllib2, ConfigParser, gzip, ftplib, urllib2, optparse
If you are interested in the program drop me a note through my Contact Me page.