Forum Moderators: phranque
I am new to this forum so apologies in advance if I break any forum etiquette or if my post is a faq.
I did search through previous postings (including reading all 3 Close to perfect .htaccess ban list threads - wow! They contained a wealth of fabulous info) but I didn't quite find what I am looking for, hence this post.
Recently I am finding that my logfiles are full of referrer spam - you know the kind of thing, phentermine, texas-holdem, debt-consolidation links, etc.
I use Awstats to analyse my logfiles but I have password protected that folder so the referrers shouldn't be doing any good for anyones page ranking.
I have a .htaccess file with plenty of RewriteCond %{HTTP_REFERER} lines in it but being new to unix I'm not sure:
1. Is this the best way to prevent referrer spam and
2. Is my .htaccess file well-formed (is there an online .htaccess file validator - the same way there is one for robots.txt files, for instance?)
Thanks all,
RandlePMcMurphy
Thanks for your interest,
RandlePMcMurphy.
The best thing to do is to make sure that your stats pages aren't publicly accessible - if you do that then most of the incentive for people to spam you goes away.
thanks for the prompt response.
Really? There's no way to stop it? That's a bit disappointing!
Especially seeing as, even though I have my stats pages are password protected, the amount of referrer spam I am getting is increasing daily - to the point where it is rendering my logfiles useless for referral analysis.
Thanks anyway,
RandlePMcMurphy
If "stop referer spam" means, "stop them from accessing the server," then no, you can't do that unless they always access your site from the same IP address or address block. If they do, then you can dump requests from those IPs at the server router/firewall.
If "stop referer spam" means, "stop them from appearing in the log files," then you can do that *if* you have access to httpd.conf, using Apache mod_log_config (log those requests to dev_null).
Jim
thanks for your replies on this. Unfortunately my site is on an ISP's shared server so I have my doubts that they'll allow me to modify the httpdconf (!) - still, I have a very good relationship with them and it can't hurt to ask.
Now the sticky question - where would be a good place to go to find examples of the kinds of changes it would be advisable to make (I'd like to have a bit of research done before I approach them about httpdconf changes!).
I did a site search here for mod_log_config but didn't turn up anything useful for this purpose. I found the Apache reference (http://httpd.apache.org/docs/mod/mod_log_config.html) but this seems a tad sparse on details.
Thanks,
RandlePMcMurphy
Just a quick update - I contacted my isp about changes to the httpdconf file and thie was their response:
"I'm afraid that we cannot provide individual access to the config file, as apache uses 1 single file for hosts.
Any changes you want we can implement however, provided that they do not impact on any other users or the security of the server."
So if I could find some info on how to dev null accesses from people selling phentermine etc. I could once more start seeing where people are coming to my site from!
Thanks,
RandlePMcMurphy
CustomLog logs/access.log combined env=!mylogs
The rest of that article in the PM should allow you to write up some samples for your virtual host container. I'm assuming you have your own log file.
One big problem you'll have is that you can't immediately test and change if there's a problem with your sample. Perhaps your ISP can work through that with you or you can test on a development machine. I'd wager your ISP is unlikely to restart apache many times for you if it will interfere with other sites.
As an alternative, do you have command line access to your logs? If so, then you can rewrite them before downloading or archiving (if that's what you're doing). This would require some type of sed or grep to a new file.
Also, this thread is getting close to belonging in tracking and logging.
thanks for your reply. Yes I read the link your PM'd me and I understood a lot of it but not all - did I mention I am new to unix not just this forum? I'll have another look at it to see if I can make more sense of it now that I have read more around the subject.
I don't have command line access to my logs unfortunately, so sed or grep are not an option.
What I am trying to figure out is how to stop the logging of access to my site from various-subnets.xyz.tld - it wasn't clear to me how to do this from that article, however, hopefully I'll get it from another reading of it.
Sorry if this off-topic - I thought it was the appropriate place to post the query here as it was where I found the "perfect htaccess" threads.
Thanks,
RandlePMcMurphy