Page is a not externally linkable
jdMorgan - 6:13 pm on Nov 27, 2002 (gmt 0)
No, robots.txt is interpreted using prefix-matching. The match starts at the left, proceeds to the right, and anything you leave off at the right end means "don't care". Wild-cards are not recognized, per se. As a result, No, the path would have to start with and contain everything you included in the Disallow. You can get only "sort of" a wild-card effect if the URLs you wish to disallow all start with the same prefix, i.e. Disallow: /private/ will keep the spiders out of subdirectory /private. If you cannot arrange your directory structure to use this approach, then you'll have to Disallow them individually. It does not matter that your search-engine-friendly URLs do not actually exist as files. If you tell the robot that it's OK to request those URLs, they will do so when they find a link, and will then be subject to your RewriteRules. So they'll "land" on the page you expect them to, just like a human visitor. Clear? Almost all of the search engine spiders now recognize the <meta robots> tag, so as DaveN states, that may be the easiest solution to your problem. The downside is that the pages will be fetched, even though they will not be listed in the search engine index. That just means some wasted server bandwidth. HTH,
Trisha, will or
Disallow: /../to/
Disallow: */to/
work?
Disallow: /section1/to/
means, "don't spider anything that starts with /section1/to/" - the contents of subdirectory /section1/to/. Would they also not spider anything with an URL containing "section1"?
Jim