Forum Moderators: phranque

Message Too Old, No Replies

Preventing referrer spam

I have huge amounts of referrer spam in my logfiles and want to stop it

         

RandlePMcMurphy

3:35 pm on Jan 7, 2005 (gmt 0)

10+ Year Member



Hi all,

I am new to this forum so apologies in advance if I break any forum etiquette or if my post is a faq.

I did search through previous postings (including reading all 3 Close to perfect .htaccess ban list threads - wow! They contained a wealth of fabulous info) but I didn't quite find what I am looking for, hence this post.

Recently I am finding that my logfiles are full of referrer spam - you know the kind of thing, phentermine, texas-holdem, debt-consolidation links, etc.

I use Awstats to analyse my logfiles but I have password protected that folder so the referrers shouldn't be doing any good for anyones page ranking.

I have a .htaccess file with plenty of RewriteCond %{HTTP_REFERER} lines in it but being new to unix I'm not sure:
1. Is this the best way to prevent referrer spam and
2. Is my .htaccess file well-formed (is there an online .htaccess file validator - the same way there is one for robots.txt files, for instance?)

Thanks all,

RandlePMcMurphy

RandlePMcMurphy

4:11 pm on Jan 7, 2005 (gmt 0)

10+ Year Member



Just to give a few more details on this referrer spam
- most of the spam has "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322) as the User Agent, so I can't ban that and
- the spam originates from constantly shifting IPs and domains so I can hardly use those details to prevent this either.

Thanks for your interest,

RandlePMcMurphy.

dcrombie

7:07 pm on Jan 7, 2005 (gmt 0)



You can't really stop 'referer spam' as even if you send a 403 it still shows up in the logs.

The best thing to do is to make sure that your stats pages aren't publicly accessible - if you do that then most of the incentive for people to spam you goes away.

RandlePMcMurphy

7:51 pm on Jan 7, 2005 (gmt 0)

10+ Year Member



Dcrombie,

thanks for the prompt response.

Really? There's no way to stop it? That's a bit disappointing!

Especially seeing as, even though I have my stats pages are password protected, the amount of referrer spam I am getting is increasing daily - to the point where it is rendering my logfiles useless for referral analysis.

Thanks anyway,

RandlePMcMurphy

billegal

2:39 am on Jan 9, 2005 (gmt 0)

10+ Year Member



I'm not sure about not stopping the referrer spam. I think you can customize your logs so that certain things are not logged. For example, I had read somewhere that spiders, your own IP, and other things can be used to block logging.

Check your PM for a link.

jdMorgan

3:36 am on Jan 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a question of semantics:

If "stop referer spam" means, "stop them from accessing the server," then no, you can't do that unless they always access your site from the same IP address or address block. If they do, then you can dump requests from those IPs at the server router/firewall.

If "stop referer spam" means, "stop them from appearing in the log files," then you can do that *if* you have access to httpd.conf, using Apache mod_log_config (log those requests to dev_null).

Jim

RandlePMcMurphy

11:36 am on Jan 9, 2005 (gmt 0)

10+ Year Member



billegal and jdmorgan,

thanks for your replies on this. Unfortunately my site is on an ISP's shared server so I have my doubts that they'll allow me to modify the httpdconf (!) - still, I have a very good relationship with them and it can't hurt to ask.

Now the sticky question - where would be a good place to go to find examples of the kinds of changes it would be advisable to make (I'd like to have a bit of research done before I approach them about httpdconf changes!).

I did a site search here for mod_log_config but didn't turn up anything useful for this purpose. I found the Apache reference (http://httpd.apache.org/docs/mod/mod_log_config.html) but this seems a tad sparse on details.

Thanks,

RandlePMcMurphy

RandlePMcMurphy

12:50 pm on Jan 10, 2005 (gmt 0)

10+ Year Member



Guys,

Just a quick update - I contacted my isp about changes to the httpdconf file and thie was their response:

"I'm afraid that we cannot provide individual access to the config file, as apache uses 1 single file for hosts.
Any changes you want we can implement however, provided that they do not impact on any other users or the security of the server."

So if I could find some info on how to dev null accesses from people selling phentermine etc. I could once more start seeing where people are coming to my site from!

Thanks,

RandlePMcMurphy

billegal

9:36 pm on Jan 10, 2005 (gmt 0)

10+ Year Member



Did you have a chance to check the link I PM'd you? The technique on that page uses a SETENV and then blocks logging based on the SETENV. By way of example for the SETENV mylogs:

CustomLog logs/access.log combined env=!mylogs

The rest of that article in the PM should allow you to write up some samples for your virtual host container. I'm assuming you have your own log file.

One big problem you'll have is that you can't immediately test and change if there's a problem with your sample. Perhaps your ISP can work through that with you or you can test on a development machine. I'd wager your ISP is unlikely to restart apache many times for you if it will interfere with other sites.

As an alternative, do you have command line access to your logs? If so, then you can rewrite them before downloading or archiving (if that's what you're doing). This would require some type of sed or grep to a new file.

Also, this thread is getting close to belonging in tracking and logging.

RandlePMcMurphy

9:01 am on Jan 11, 2005 (gmt 0)

10+ Year Member



Hi billegal,

thanks for your reply. Yes I read the link your PM'd me and I understood a lot of it but not all - did I mention I am new to unix not just this forum? I'll have another look at it to see if I can make more sense of it now that I have read more around the subject.

I don't have command line access to my logs unfortunately, so sed or grep are not an option.

What I am trying to figure out is how to stop the logging of access to my site from various-subnets.xyz.tld - it wasn't clear to me how to do this from that article, however, hopefully I'll get it from another reading of it.

Sorry if this off-topic - I thought it was the appropriate place to post the query here as it was where I found the "perfect htaccess" threads.

Thanks,

RandlePMcMurphy