Forum Moderators: goodroi
--Shawn
Most robots follow the robots exclusion standard, which can be found at
[info.webcrawler.com...]
The idea of limiting a robots crawling is that you might not want all pages indexed (any inexed page can be an entrypoint to your site). You wouldn't want to start your visit at the feedback page, or perhaps you dont want robots to index your pdf docs which you keep in a certain dir. Or there might be a section that's password protected, eg.
Just ideas
We used a robots.txt file to keep the privacy policy pages from being the first listed in the search results when the site was spidered. It was embarrasing to type in the keyword for the site, and the first listing was "This our dry legal privacy policy". When people saw that page pop up, they wouldn't even click a nav link on the page to go somewhere else, they would just hit back and avoid the site all together.
*shrug* That's why WE used it.
-G