Forum Moderators: open

Message Too Old, No Replies

How Differentiate Users from Bots?

Offering different access

         

grahamberends

10:49 pm on Jan 21, 2006 (gmt 0)

10+ Year Member



Hi There!

Please help with this puzzle:

I wish to create four classes of access:
- bot
- visitor, ie non-member
- member
- administrator

But, . . . visitors and bots don't login. So my system can't differentiate between them. Therefore, I am currently giving all visitors (including bots) full viewing access.

That's a problem for me.

Instead I wish to differentiate between visitors (eg: MSIE users) and bots, when they call the first page, so that the system can then manage their respective range and access.

How do I do it? How can I tell the difference between them?

(My web site <snip> is driven by PHP and Apache on Linux)

Look forward to your replies
Thanks
GrahamB

[edited by: volatilegx at 3:25 am (utc) on Jan. 22, 2006]
[edit reason] no URLs please [/edit]

keyplyr

9:59 am on Jan 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One way is to limit bot access in your robots.txt file. You could deny access to deep pages, image files, anything you don't want bots to get. For more info, see the Robots.txt forum [webmasterworld.com].

Bad bots (those who will not obey your robots.txt directives), IP ranges, downloading tools and other user agents can be effectively controled using mod_rewrite and mod_access in your .htaccess file. For more info, see the Apache Web Server forum [webmasterworld.com].

Matt Probert

11:47 am on Jan 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How do I do it? How can I tell the difference between them?

Reliably, you can not.

However, many robots do identify themselves and you could check the supplied useragent, and redirect accordingly, keeping in mind that useragent can easily be set to any value one likes.

Matt

grahamberends

3:43 pm on Jan 22, 2006 (gmt 0)

10+ Year Member



Thanks Keyplyr and Matt

You say:
- limit bot access in your robots.txt file
- deny access to deep pages, image files, anything you don't want bots to get
- see the Robots.txt forum
- Bad bots . . can be effectively controlled using mod_rewrite and mod_access in your .htaccess file.
- see the Apache Web Server forum.

These are solid key words and links. I'll research them.

Thanks
GrahamB

DamonHD

1:38 pm on Jan 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

The main (and very quick) way that I distinguish bots from humans in order to save processing time on the 90% of hits on my sites that are from bots, is to check for a Referer (yes, one "r" in the middle) header.

No spider that I care about or that is responsible for much traffic for my sites sets it, and although some users turn off Referer for security, AND it won't be present on a type-in or bookmark hit, it IS present for me on almost all real human visits.

I simply make sure that the page is useful (and semantically identical; not black-hat "cloaking") in any case, and just a little faster to load, which is good for a first page (eg type-in) anyway, eg by omitting a background image and showing a cheaper-to-compute related-pages set.

Rgds

Damon

Dijkgraaf

9:03 pm on Jan 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also do a seach for "bot trap"

Usually this is a page for which there is a hidden link leading to it from the first page. Bots will see it but your users won't, so anyting that visits the hidden page is most likely to be a bot.

wilderness

11:50 pm on Jan 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ban with Perl
[webmasterworld.com...]

Spider Trap Msg#2
[webmasterworld.com...]

Updated PHP Bot script
[webmasterworld.com...]

Blocking Badly Behaved bots #3
[webmasterworld.com...]