Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Robots.txt question


idiotgirl - 7:19 pm on Aug 17, 2002 (gmt 0)


There are several spiders that do not heed robots.txt. They don't pull it at all, or totally ignore it. Most of these are email harvesters, downloading agents, spam bots, and leechware - however. These aren't legitimate bots/spiders that will help you in the real world. So, adding them to robots.txt is generally a waste if they are a confirmed abuser.

As a rule - most bots with variations of "rip", "siphon", "harvest", "download", etc. etc. are going to disregard robots.txt and do as they please. It's probably more constructive to concentrate on the spiders you want to visit your site and instruct which pages and directories to parse through robots.txt than to try to ban rude bots through it.

If you look through the posts at WebmasterWorld you'll see lots of people report whether a bot disregarded robots.txt and form your own conclusions about who/why/what to include or exclude in your robots.txt file. Also, many of the current legitimate spiders can be found at searchengineworld.


Thread source:: http://www.webmasterworld.com/robots_txt/94.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com