Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: goodroi
robots.txt DOES work, IF:
1. You have all of your statements formatted correctly. Yesterday I had a spider plow through an area I *thought* was blocked, but since I had the line blocking that area written incorrectly, it didn't work.
After emailing the spider's owner (antarcti.ca), determining the problem and fixing it, the terrifically nice folks at antarcti.ca's tech dept. sent their spider through again, and my robots.txt worked like a charm.
2. The robot in question follows robots.txt conventions. All of the major search engines and important/good spiders DO follow robots.txt instructions...
Any robot I find that doesn't request a robots.txt file, or ignores *properly formatted* directions therein, is banned form my site via htaccess, and loud complaints are sent to its owner.
Robots.txt Validator [searchengineworld.com]
Robots Exclusion Meta Tag [searchengineworld.com] Using robots metatags.
Robots.txt : The Big Crawl [searchengineworld.com]We recently spidered 2million robots.txt files and found a surprising number of problems.
Robots Exclusion Standard rfc4 [info.webcrawler.com].
Root of Robots Exclusion Standard [info.webcrawler.com] directory with some interesting files.
Search Indexing Robots and Robots.txt [searchtools.com] article at searchtools.com.