Stop forum spammers in your htaccess

Having spammers hit your blog or forum? Why not block them by rules in htaccess to start with! Adding the spammers IP to your htaccess file keeps it from hitting your site to start with and keeping the website more accessible for the real users. With the help of a website and a small Python program I wrote that is exactly what I did.

Let me start with some back ground information.
Because my article on compile a custom kernel for Ubuntu was published on Linux Today my traffic increased. Besides the people really interested in this article it also attracts spammers, trying to get their “message” across. I was already running WP-Spamfree and that really keeps the comment section clean but every spammer still hits the index.php and that causes your CPU utilization to go up. I installed WP Super Cache as well but it still wasn’t enough so I decided to add the spammers IP to a deny list in my htaccess file.

The first thing I needed was a list of known spammer IP’s. I found a website that keeps track of Forum Spammers ( and offers a list in CSV format with known spammer IP’s. It also offers an API to directly check their on-line database.

I wrote a Python program that will do the following:
It downloads the CSV list from the above website.
It downloads the Apache log from my server. My provider offers logs, a monthly one and a current running daily one.
It download the htaccess file.
It then process the Apache log, checking if the IP hitting my site is a known spammer according to the CSV list. If it is it will add this IP address as a deny from in my htaccess file.

The program keeps a log file of what it does and at the end it will show a summary of what it did.

[2008/11/23 18:50:52] [INFO    ] Starting
[2008/11/23 18:50:52] [INFO    ] Retrieving Banned IP list from
[2008/11/23 18:50:52] [INFO    ] Banned IP list retrieved
[2008/11/23 18:50:53] [INFO    ] Retrieving Apache log
[2008/11/23 18:50:59] [INFO    ] Apache logfile retrieved
[2008/11/23 18:50:59] [INFO    ] Retrieving .htaccess file
[2008/11/23 18:51:00] [INFO    ] .htaccess file retrieved
[2008/11/23 18:51:00] [INFO    ] Processing Apache logfile
[2008/11/23 18:51:15] [INFO    ] Apache logfile processed
[2008/11/23 18:51:15] [INFO    ] Starting update .htaccess file
[2008/11/23 18:51:19] [INFO    ] Update .htaccess file completed
[2008/11/23 18:51:19] [INFO    ] Total IP scanned: 6049 - Clean: 5914 - Spam: 135
[2008/11/23 18:51:19] [INFO    ] Total Banned: 135 - New: 5 - Known: 130 - Removed: 0
[2008/11/23 18:51:19] [INFO    ] finished

I still have to manually upload my htaccess file back to my site but this a deliberate decision I made. Should something go wrong with the check or update of the htaccess file it won’t mess up my site if I upload the file back automatically.

The program can work for other providers but there might have to be some minor tweaks to the program itself. Most of the program is configurable through an INI file, things like username, password, directories for the FTP retrieval of the htaccess and Apache logs can all be customized.
You can use the program for any website of course, it’s not limited for a WordPress site, I use the same program with different settings for my phpBB forum as well.

The program has the following requirements:
OS: Linux (I use a system command to copy a file)
Python 2.5 (of course)
The Python modules: logging, urllib2, ConfigParser, gzip, ftplib, urllib2, optparse

If you are interested in the program drop me a note through my Contact Me page.

This article is filed under the categories Software » Web and has the following tags associated with it: , , , , , , , , .
  • With respect to not automatically updating the .htaccess file: couldn’t you backup the old one, make your new one and then get the python script to run a couple of simple tests for you? could then perhaps email the results or store them somewhere for you?

  • That was interesting to read and I wonder I have not even thought we can do so. But why dont you put the program here itself. That might be good rather than getting it contacting you. Do you feel someone hack it?

    • At the time I wrote this the program was basicaly one big hack, variables were hard coded and stuff.
      The program is also written in a way how my hosting provider offers me the logs of Apache. The .htaccess file also needs to be modified by hand.

      The little program has evolved a bit, it uses a configuration file right now but the download of the monthly logs still depends on my provider.

      I’ll write up an update and post the program with configuration file.