Forum Moderators: DixonJones

Message Too Old, No Replies

Any advice on detecting stealthy spiders/rippers?

         

daugava

12:17 am on Apr 13, 2007 (gmt 0)

10+ Year Member



Hi,

I was wondering if you can share advice on how to distinguish regular users vs spiders/rippers. It's easy when bot identifies itself, but some of them leave User Agent empty, or even pretend to be a regular web browser.

I have a scheduled process running on my server, which looks for IPs with concurrent connections over a certain threshold. When found, I check their log entries.

At this point, I try to determine whether it was a human or robot.

One of the methods I am using is this - judging whether user's path thru the site makes sense. For example, I have links to translate my pages to several languages. I can't imagine a person trying to read pages in multiple languages, there is no point in doing that. So, I've been assuming that if someone hits my site and loads all language versions for each, it's a bot.

Also, I check if someone is moving through the site too fast - for example I may see an IP opening 10 pages per second - no human could click that fast.

However, recently I've been having doubts - perhaps I've been banning
some IPs who are used by normal users, who happen to use a web browser which pre-caches the links on a page or something like that.

Any thoughts on this? Do any major web browsers actually do that?
Any other tricks on detecting the abusers?

Andy

Receptional

1:28 pm on Apr 13, 2007 (gmt 0)



Another idea is to challenge a user who appears to be moving fast... by redirecting them to a page that require user input, if they visit more than a page a second. This will still stop a spider, but allow a human through.