Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- In the era of increasing numbers of bad bots is robots.txt irrelevant?


incrediBILL - 4:40 pm on Feb 20, 2006 (gmt 0)


until some new standard is adopted

Which is somewhat desperately needed as there are all sorts of bots looking for all sorts of content and keeping pace with them just isn't viable.

You could keep a lot of bots off your site if you could just tell them what type of data you have on your site so Oodle wouldn't crawl looking for classifieds if you didn't have any, or Kosmix wouldn't be looking for health information, or that Polish crawler I'd never heard of before or can't pronounce wouldn't be scanning for Polish language pages, etc.

Additionally, there needs to be a mechanism in place to verify the bot is who it says it is so scrapers just can't minic Google and crawl the site.

Don't have a bright idea at the moment to solve the weaknesses of robots.txt other that just whitelisting the allowed bots and everything else is blocked by default and enforced using .htaccess

Not even sure in this current climate of entitlement to crawl that creating a better standard would solve anything.


Thread source:: http://www.webmasterworld.com/robots_txt/871.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com