Forum Moderators: open
The way to do it is to flag any user agent strings that
a don't <u>completely</u> match the usual User agent,
b ones that don't match by ip and or don't look up by whois in arin.net, where you can see who is allocated to the ip block.
Usually, I find if it doesn't have an arin.net whois entry, it isn't real. All of the engines, or exodus, end up being registered for the real spiders that should come crawling.
For a great list to get you started, check out Brett's list over at [searchengineworld.com...] It is one of my favorite resources.
Cheers,
Han Solo
Startup, I guess it depends on how good the person is who is trying to crack your cloak.
But i do not think this will be enough to do what you want. I'd take a close look at ENV variables such as HTTP_REFERER and the like for secondary screening.
check refer
if yes -> surfer
if no
check ip
check ua
if ip and ua match -> spider
if ip match only -> ua alert (possible spider)
if ua match only -> ip alert (new ip or a snooper)
if none match -> surfer (type in and such) but could also
be a spider with both ip and ua new
of course in practise there is whole bunch of exceptions
that need to be considered like known spiders using plain mozilla ua, translators etc.
have fun ;)