Forum Moderators: phranque
Recently, I've had a bot pinging single pages repeatedly to up a user's search rankings within my site.
Looking through the logs, I've found this to be their entry repeatedly and always within 10 to 15 seconds of each other:
123.123.123.123 - - [14/Jul/2009:00:30:00 -0700] "GET /example.com/profilenameomitted HTTP/1.1" 301 310 "-" "-"
I've omitted the user's profile name and IP address, but you get the idea. Their bot is apparently named, "-", and it's making things quite annoying.
Anyone have a clue on how to block this bot? I've tried a few things with no success. Here's how I'm currently blocking bots:
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
Again, any help would be GREATLY appreciated. Thanks everyone! :)
[edited by: jdMorgan at 3:36 am (utc) on July 15, 2009]
[edit reason] example.com [/edit]
RewriteCond %{REMOTE_ADDR}>%{HTTP_USER_AGENT}>%{REQUEST_URI} ^123\.123\.123\.123>>/example.com/profilenameomitted [OR]
BTW, [L] used with [F] is redundant. You can use just [F] instead of [F,L] on your rule.
Oh, and the ">" characters in the RewriteCond string and pattern are just "visual spacers" -- They don't actually do anything here except illustrate the 'field boundaries.'
Jim
Thanks a lot for the info. Here's the problems I foresee becoming bigger issues down the road:
1. I'm pretty sure this will be run by more than one user in the future, and rather than blocking IP(s) from portions of the site, I'd rather just block the bot in question.
2. The next logical step for an offender of this type is to start randomizing their IP on every page load rather than just changing their session name. (I've currently got code in place to protect from IP's with the same session name boosting page rank by simply refreshing the page.)
3. As my user base grows, manually updating my .htaccess file for every offender will become terribly problematic and tiresome, and I'm sure will eventually affect load times with enough offenders, so I would really like just block whatever bot is causing the problems.
So, any ideas on how to block the bot simply named with a dash? :)
Unfortunately, this is common even with non-misbehaving users; ISPs that use caching proxies in their networks (e.g. AOL, Earthlink, and many more) will send requests with no HTTP User-agent header as well -- and their users will be unaware of this, too.
So, it sounds like you can't block it except behaviorally. Maybe change your TOS to say that bot-runners will be booted, no warnings, no refunds, accounts forfeit and deleted. Then use your session method to detect too-frequent loads of this 'rank-increasing URL thing' as well as too-frequent page-refreshing.
Jim