Forum Moderators: DixonJones
<head><title>blahblah</title>scripts and stuff</head>more scripts</head><body>content w scripts</body></html> My Logfile
205.#*$!.#*$!.75 #*$!my.example.com - [01/Mar/2005:20:21:42 -0800] "GET / HTTP/1.0" 200 7884 "http://xxxxx-xxxxx-example.com" "Mozilla/5.0 (compatible; Konqueror/3; Linux; X11)" This link seems to be calling my root directory with no specific page request. What is this guy up to and what can I do about it?
[edited by: engine at 12:02 pm (utc) on Mar. 15, 2005]
[edit reason] examplified [/edit]
There is software to produce per hour some 1000 false referer log entries.
Webmasters are usual very curriouse and visit all this referers.
This problem just stops me to make a new site about many different web statistics. My current web statistic site got last month about 15.000 wrong referer log entries by referer spamers.
To sort out all the wrong statistic entries makes publishing of the stats site much more work intensive.
The only other thing to do about this is to educate other webmasters, and don't leave open-access stats pages if you build sites for others.
First, always tag dynamic (auto-updated) statistic pages with NOINDEX/NOFOLLOW meta tags. This will never stop the spammer, don't even think a spammer would be so kind as to obey silly meta tags, but Google can and will lower your own site's ranking if it finds a slew of garbage links in your site's pages, and this helps keep it from happening.
Then:
Write a script for .htaccess which either:
a) bounces the hit back to them
advantage: spammer can feel the pain when they see they are self-referring themselves in their statistics, ought to be a kodak moment.
disadvantage: the not-so-smart spammer fails to realize the obvious, thinking instead it is working! After all, it still records as a hit on their site even if it is the same hit they sent.
b) absorbs the hit
advantage: spammer gets nothing for their effort.
disadvantage: spammer feels no pain and may never cease and desist (actually this is true of tactic 'a' as well).
Anyway, the script goes:
###STOP REFERRAL SPAMMERS
# Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(www\.)?firstspamrefsite.com.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spamrefsite2.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spamrefsite3.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spamrefsite4.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spamrefsite5.org.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?spamrefsiteetcetcetc.etc*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?lastspamrefsite.org.*$ [NC]
### For the last part: this absorbs the hit
RewriteRule .* - [F,L]
# This next rewrite rule bounces the hit:
# RewriteRule ^(.*)$ %1 [R=301,L]
# And this last sends any referrer you want:
# RewriteRule \.*$ [fah-Q.ref.spammer...] [R,L]
That is it. It would be nice if someone could write a script to auto-detect refspammers, blacklists get quite lengthy after time.
Hope is help,
Pascal
[edited by: tedster at 7:32 pm (utc) on Mar. 14, 2005]
[edit reason] remove extended signature [/edit]