Semalt, Kambasoft referer spam – a new ant in your shirt!

As well as designing and making things with the laser machine (the fun stuff), a frustrating amount of time goes on maintenence tasks that look like they should be simple.  Here’s yesterday afternoon’s (and last night, and last Wednesday’s) example.  I’ve included some detail in the hope that it will help someone else.

When a visitor’s browser requests a page from your site, their request includes a lot of information that no-one normally sees, including a tag called ‘referer’, which tells you what site they came from.  If you use a tool such as Google Analytics, you can see a handy list of all the sites that link to you and how many visitors arrive by that route.

Recently, mysterious unfamiliar names have started appearing.  It turns out they’re not real visitors, but robots designed to fill your log files with their own web site address.  To make themselves harder to block, the most prominent one keeps changing its IP address (by rudely using other people’s hijacked computers) and referer tag (eg from ‘semalt.com’ to ‘semalt.semalt.com’ to 123.semalt.com’ etc). They are making it hard for people to interpret their logs, skewing statistics and causing a lot of irritation (Google ‘Semalt’ if you want to get more detail, some theories on why they are really doing this and a flavour of how worked up people are getting).

There are lots of suggestions online on how to handle them.  I decided to block them using .htaccess but got badly misled by some errors in one of the solutions I found on line.  Here’s what seems to have worked for me:

Step 1
Check your web server runs Apache (http://browserspy.dk/webserver.php will tell you). If your server runs Apache you can normally change your configuration by including instructions in a file called .htaccess, which is kept in the directory where your html files are stored.

Step 2
Get an FTP client program (I used FileZilla) to allow you to view, transfer and edit the files on your server.

Step 3
If one already exists, download your old .htaccess file to your own computer and save it somewhere safe so that if it goes horribly wrong you can put things back as you found them.

Step 4
Edit your .htaccess file to include the blocking code.  As the file name starts with a dot, you might need to save the file on your computer as ‘htaccess.txt’, edit it in Notepad, upload the new version to your server then rename it back to ‘.htaccess’.

Because WordPress had already created an .htaccess file for me, I had to combine the old and new code.  This is what worked for me in the end:

RewriteEngine On

# BEGIN Block Referrer spammers
Options +FollowSymLinks
RewriteCond %{HTTP_REFERER} kambasoft [NC,OR]
RewriteCond %{HTTP_REFERER} savetubevideo.com [NC,OR]
RewriteCond %{HTTP_REFERER} semalt.com [NC]
RewriteRule .* - [F]
# END Block Referrer spammers

# BEGIN WordPress
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# END WordPress

Step 5
Check your web site still works correctly (this is where you might need the old .htaccess file that you saved in Step 3).

Step 6
Check that your new .htaccess file hasn’t slowed your web site down.  Judge it by eye, or Google has a tool that will give you a numerical score and tips on what’s causing the delay.

Step 7
Check you’re blocking what you expect to be blocking using cURL.  I used the Linux version but there’s a version for Windows as well.  Use this command from a terminal to request your homepage, while pretending that you were referred from semalt.com :

curl www.mysite.co.uk -e semalt.com

You’ll either get the whole of www.mysite.co.uk’s source code scrolling across your screen, or an error message which tells you that the block has worked for that referer.  I used this to confirm that referers such as semalt.semalt.com and 555.semalt.com are also blocked.  Take care not to inadvertently block friendly traffic (eg if your RewriteCond line uses the term ‘malt.com’ it will block ‘semalt.com’ but also ‘deliciousbarleymalt.com’).

Step 8
Keep an eye on your website and your analytics data to make sure it continues to work as expected.  I’m afraid new spammers will crop up and have to be added to the list.  Fortunately, you can have any number of RewriteCond lines.

I hope someone finds this useful, but the inevitable disclaimer follows:

  • I’m not an expert in this subject, I’d rather be making wooden postcards
  • It seems to have worked for me but it might not work for you if your web site is set up differently
  • I’ve only just implemented this file, so have not yet seen it work in practice
  • It’s probably worth what you paid for it