Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Why should I have a robots.txt file?


jwolthuis - 4:36 pm on Dec 31, 2006 (gmt 0)


Besides the "normal" search engines, there are specialty search engine you may want to allow or block. At a minimum, it's good to be aware of them.

Internet Archive Wayback Machine: Takes a periodic snapshot of your site, making it available for browse/search years after pages may have been taken down. To block it, put these lines in your robots.txt file:

User-agent: ia_archiver
Disallow: /

Google Images, Yahoo Image Search, PicSearch: These crawlers look for images on your site, make a best-guess as to their content, and make it easy for everyone to view or download. Depending on whether you think this is good or bad, you may want to block them. Add these lines to your robots.txt file:

User-agent: Googlebot-Image
Disallow: /

User-agent: Yahoo-MMCrawler
Disallow: /

User-agent: psbot
Disallow: /


Thread source:: http://www.webmasterworld.com/robots_txt/3203372.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com