Page is a not externally linkable
biggles - 2:40 am on Dec 2, 2002 (gmt 0)
I've looked at the WebmasterWorld robots.txt [webmasterworld.com ] for inspiration/guidance and I'm puzzeled by some of the exclusions such as WebmasterWorld Extractor Also this file appears to differ from the comprehensive robots.txt file that only allows known "nice guy" spiders on the tutorial page - robots4.txt [searchengineworld.com ]. This features some different agents like BlackWidow. Any suggestions please on a list of "must exclude" agents for a robots.txt file. Thanks
I'm think about extending my robots.txt file to exclude mail harvesting agents due to the amount of email spam I've been getting. I'll also take the opportunity to exclude content harvesters and other bandwidth stealing agents.