Forum Moderators: open

Message Too Old, No Replies

how to differ humans from spiders

         

kost81

10:36 am on Mar 28, 2006 (gmt 0)

10+ Year Member



how to distinguish web users(browsers) from spiders, bots, robots using user-agent?

Romeo

11:53 am on Mar 28, 2006 (gmt 0)

10+ Year Member



... by gut feeling.

If a client fetches lots of (or even all) pages at a high rate and/or traversing the site in a short time, it may be a bot -- or a human user using a "download entire site" program.

If a client does not fetch external style sheets or embedded images, it may be a bot -- or a human user who is using a text based browser or has switched "images off".

If you see a client fetching your robots.txt, it may be a bot -- or a curious human user.

If you see a client not fetching your robots.txt, it may be a human user -- or a malicious bot not respecting the robots.txt standard.

It all depends, and you never can be sure.

Welcome to WW and
kind regards,
R.

kost81

12:30 pm on Mar 28, 2006 (gmt 0)

10+ Year Member



I'll specify my question
I have user-agent every visitor(browser, spider) of my site. I’d like to know which of my visitors is human user(web browser) or robot, bot, spider.
For example user-agent : Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) means web browser
User-agent : Mozilla/4.0 (compatible; MSIE 5.0; Windows ME) Opera 5.11 [en] means robot
Is there any clear algorithm to find out which user-agent represents web browser?

lammert

1:41 pm on Mar 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Kost81, First of all Welcome to WebmasterWorld!

User-agent : Mozilla/4.0 (compatible; MSIE 5.0; Windows ME) Opera 5.11 [en] means robot

No, this is the Opera browser in English language. Looking at the User-agent is not a real good method anymore to distinguish robots from humans because many robots spoof their user agent string. Timing (the amount of pages fetched per second, not loading .js and image files, hits on robotst.txt etc. are all indications of robots.)

The spider identification forum [webmasterworld.com] here on WebmasterWorld can give you some idea of the problems involved in separating the real visitors from the electronic ones.