Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Add Sitemap To robots.txt For Autodiscovery


argiope - 2:57 am on Apr 13, 2007 (gmt 0)


yesterday I've blogged about this too.

site scrapers can really take advantage of it too...
however,

I've made a small php code that you can check who is requesting your sitemap. You can detect if the requester is a known searchengine or not.

<snip>

<?php
function botIsAllowed($ip){
//get the reverse dns of the ip.
$host = strtolower(gethostbyaddr($ip));
$botDomains = array('.inktomisearch.com',
'.googlebot.com',
'.ask.com',
);

//search for the reverse dns matches the white list
foreach($botDomains as $bot){
if (strpos(strrev($host),strrev($bot))===0){
$qip= gethostbyname($host);
return ($qip==$ip);
}
}
return false;
}

if (!botIsAllowed($_SERVER['REMOTE_ADDR'])){
echo "Banned!";
exit;
}
?>

[edited by: engine at 7:52 am (utc) on April 13, 2007]
[edit reason] No urls, thanks. See TOS [webmasterworld.com] [/edit]


Thread source:: http://www.webmasterworld.com/robots_txt/3310622.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com