Forum Moderators: open

Message Too Old, No Replies

how to differenciate between a bot and human?

         

indiandomain

8:48 am on Apr 29, 2003 (gmt 0)

10+ Year Member



i have a serious problem of bad bots visiting my site.

is there anyway to technically identify a bot and differenciate it from a human?

wilderness

1:34 pm on Apr 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



anyway to technically identify a bot and differenciate

Technically no however. . .
If your experinced in reading your logs (as to how visitors have travelled both good and bad in the past)and knowledgeable about the content of your web pages and the realtionship of your pages? All of this added together to analyze how the current visitor is going through your site than you make a logical assumption. "Known" bots provided in this forum and other places also provides an enhancement to your assumption.

Also the bot traps do a good job of separating these visitors as well.

carfac

4:28 pm on Apr 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I look at the timing of the PAGE downloads... and whether the call is just for the page (html) or whether all the accompanying images/js/css also is downloaded.

You can also tell by watching the path... does it run cgi scripts that need user input, or only pages?

Finally, grep -c... see HOW many pages it looked at!

All of these (when taken together) should help you!

dave

Benala

10:54 pm on Apr 29, 2003 (gmt 0)

10+ Year Member



A guy at my host said all non-bots have "mozilla" in the user agent string. Does anyone know if this is true? If so, it should make it possible to use .htaccess to separate bots and humans.

jeremy goodrich

11:32 pm on Apr 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you have any experience with perl, it only takes a minute to build a bot that will fake the user agent string.

In addition, slurp (Inktomi's bot...) will identify as mozilla as well, and FAST will sometimes spider with a generic (mozilla) user agent.

The only sure fire way I know is to strip the inbound headers when somebody does a request to your site...many agents will have extra headers that 'real' browsers won't have.

Also, most bots will not use session variables & cookies...if you do all that, go through the headers, and use sessions variables & cookies - check for image calls as well - then you should be able to sort most of them.