Detecting spoofed user agents

Hi all,

I am currently in the process of developing a script for automatic spider detection / blocking. It does that by comparing user agents to known malicious spiders, it tracks the behaviour of clients (i.e. No. of requests per sec / minute etc.), uses hidden links to trap spiders etc.
Once a malicious spider is detected, it automatically blocks its IP.

I think what I have so far works pretty well. What I didn't quite figure out yet is how to detect spiders that spoof their user agent (i.e. pretend to be MSIE or Netscape). I have thought about using HTTP_ACCEPT and checking for image support, but I don't know how reliable that would be. Also, I found that if I refresh a page in MSIE 6.0, the HTTP_ACCEPT changes to */* instead of listing all supported media formats.

Btw, how reliable is the Robot meta tag. Is that supported / respected by most of the major SE's? I'd like this script to work on shared hosts, so robots.txt is not an option.

Any input and ideas greatly appreciated. Thanks

Detecting spoofed user agents

Best method?

Demeter

bcolflesh

Demeter

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week