Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- And Now Google's Doing It. JS Stats Show GoogleBot


TheMadScientist - 10:37 pm on May 14, 2011 (gmt 0)


The first thing to establish is whether it is a genuine GoogleBot.

That's what php and even parsing non-php extensions is for, isn't it? ;)

if(stripos($_SERVER['HTTP_USER_AGENT'],'GoogleBot')!==FALSE
|| stripos($_SERVER['HTTP_USER_AGENT'],'Slurp')!==FALSE
|| stripos($_SERVER['HTTP_USER_AGENT'],'BingBot')!==FALSE
) {
$botip = $_SERVER['REMOTE_HOST'];
$bothost = gethostbyaddr($botip);
$verifiedbotip = gethostbyname($bothost);
if($botip == $verifiedbotip && (substr($bothost, -14) == '.googlebot.com'
|| substr($bothost,-15) == 'crawl.yahoo.net'
# Not sure if Y! still crawls from inktomi search, but not a big deal to check for it
|| substr($bothost,-18) == '.inktomisearch.com'
# AFAIK Bing still crawls from msn.com. May need to be updated at some point
|| substr($bothost,-15) == '.search.msn.com')
) {
# What to do if it's a real bot
}
else {
# What to do if it's an imposter
}

Modified from jcoronella's post here: [webmasterworld.com...]

NOTE: The JS file / stats are two of the few I haven't been running a full verification on, but I think it might be time to start.

[edited by: TheMadScientist at 10:45 pm (utc) on May 14, 2011]


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4312058.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com