Page is a not externally linkable
- Code, Content, and Presentation
-- Apache Web Server
---- Coordinating robots.txt and .htaccess


jdMorgan - 5:07 pm on Oct 30, 2002 (gmt 0)


BusyNut,

I agree with GaryK - robots.txt is used to control the behaviour of "good robots" that will read and respect the Disallow statements.

.htaccess - or a similar mechanism on servers other than Apache - is used to block access by robots which do not respect robots.txt, as well as other user-agents which you wish to exclude, such as site downloaders and e-mail address harvesters.

Sometimes it is useful to include a suspicious user-agent in robots.txt as a test. In some cases, malicious 'bots will read robots.txt in order to appear innocuous, but then ignore what they've read. If you observe this behaviour, then it is likely that you are seeing a malicious user-agent.

In a very few cases, even good 'bots will make a mistake due to a coding bug, and in this case you should report the problem to the owner of the 'bot.

GaryK's use of the concept of "promoting" a user agent from robots.txt exclusion to a .htaccess block is a good way of thinking about it.

Jim


Thread source:: http://www.webmasterworld.com/apache/262.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com