Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

Referral log spamming: How use robots.txt to keep them out?



4:33 pm on Mar 23, 2004 (gmt 0)

In my stats, I have about 50% referrals coming from BS sites using referral log spamming.
As I understand, they do this by crawling the site like a SE spider.
I thought I'd try to block them in the robots.txt, but not sure how I'd do this and not block legitimate bots as well.
Can I "allow" the legitimate and "disallow" the rest, sort of?

TIA for info


5:32 pm on Mar 23, 2004 (gmt 0)

Even if you 'block' log spammers with 403's they still get into your logs.

The only real solution is to password-protect your log pages which removes their incentive for spamming you.


6:43 pm on Mar 23, 2004 (gmt 0)

Sorry, I don't understand.
As they're accessing the site by spiders, how will they get a 403 if I block them in the robots.txt? Aren't they stopped completely?


6:48 pm on Mar 23, 2004 (gmt 0)

10+ Year Member

If they are visiting for spam purposes it is unlikely that they would obey the robots.txt file.

Can you explain the purpose of this spam technique, I have not heard of it before.



8:09 pm on Mar 23, 2004 (gmt 0)


The technique is commonly called "referral log spamming". Someone is hired by a website to produce links to the site in weblogs and webstats of other sites (like mine). This will generate hits (curious webmsaters like myself will wonder who that referrer is and click the link) or ingoing links in blogs which in turn will e.g. increase search engine rankins.
This "someone", i.e. the spammer, is using a spider disguised often as a search engine spider and there's no way AFAIK to grab the IP the spider is coming from (the spammer) and there's a meaning of blocking the domains of the spammers clients since.

The above is what I understand. I might err here and there, though, but basicallym that's it.

You can do a Google "referral log spamming" and you'll get a few hits.

So, I still wonder how I configure the robots.txt to allow certain spiders but block the rest.



9:35 am on Mar 24, 2004 (gmt 0)

As Dan_Vendel said above, you can't block anyone using robots.txt. It's a voluntary protocol and only legitimate spiders will obey it (and not even all of them).

The way to actually block certain types of visitors from your site is to use the .htaccess file and identify them by User Agent or IP address (or various other parameters). There are literally thousands of references to this technique in the Apache Web Server forum.

The purpose of log spamming is not to lure curious webmasters into clicking on a link, but to get their link into your log stats and then indexed by Google which gives their site another referrer and a higher PR.

The spammers URL will get into your log stats whether you block them (403) or not. So, as I said above, the only way to stop them is to take away the incentive and make your log stats inaccessible to the general public (also can be done using .htaccess).


9:56 am on Mar 24, 2004 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

put them in your log file filter and never look back. A "access denied" will just cost you more server resources than simply filtering them off. They will just get new ip's anyway.


12:27 pm on Mar 24, 2004 (gmt 0)

dcrombie, you say:
"..the only way to stop them is to take away the incentive and make your log stats inaccessible to the general public (also can be done using .htaccess)."

I haven't a clue how to make them inaccessible for the punlic. As a matter of fact, I took for granted that they were.

I'm on a *nix box with Apache and CPanel. Can you by that tell how I do it and which unpleasant consequences this might have, if any?

Appreciate your help!



12:29 pm on Mar 24, 2004 (gmt 0)


I'd be happy to put them in he log file filter if I knew how I'd do it.
I'm on a *nix server, Apache and have CPanel. Will that be enough to give me a hint on how to do?

Your help is appreciated!



12:36 pm on Mar 24, 2004 (gmt 0)

10+ Year Member

I use webalizer which has a config file. In this you can get webalizer to ignore agents, IPs etc so that they are never reported. i.e. they will never appear in your logfile reports.

I would imaging that most logfile analysis programs are configurable in this way.

I presume that this is what brett was describing


Featured Threads

Hot Threads This Week

Hot Threads This Month