Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: goodroi
syntax to allow all:
jamesf4218 sorry I don't know about htaccess. I am a NT developer.
I want to stop all search engine spiders visiting ONE directory
Is [using robots.txt] an acceptable alternative to an .htaccess [block]?
Yes and no... A valid robots.txt will stop all spiders which request and obey robots.txt from requesting files from your Disallowed directories. However, there are two problems:
In the former case, you may need to block these bad-bots using .htaccess, browscap.ini, a firewall, or whatever is available to you.
In the latter case, even though the SE spider does not request the Disallowed file, it may still find the URL in links on your site, links on other sites, or even in a server log or collection of user bookmarks unintentionally left on-line somehwere. It will therefore list the URL without a title or description, but sometimes with the link text found on the page that links to it.
I have previously argued that this flies in the face of the intent of the robots exclusion standard, if not its literal wording. However, it depends on whether you define the word "index" to mean, "fetch a page and parse it" or "include it in our index". Some SE's won't mention a file that's disallowed with robots.txt, but some will - So I've learned, "that's life, get over it, find a work-around, and move on..."
About the only thing I know of to stop these search engines from listing the URL of a private page (without cloaking) is to link to your "private" pages only through another "linking page" that meets the following criteria:
Then you have the problem of 'bots which don't read and respect robots.txt. These need to be blocked. I use a combination of .htaccess and an automatic bad-bot banning script [webmasterworld.com] that was posted here on WebmasterWorld by Key_Master. I have tweaked it to handle multiple simultaneous requests (by adding file-locking to it) and I'm "evaluating" it now to make sure the tweaks didn't break it. On low-to-moderate-traffic sites, it should work just fine as originally posted.
however a(probably dumb) question
what about those _vti_* directories and files? I don't use them but need them to stay on the server for other users that may follow
am I right in thinking that they're private anyway or should I include them in the disallow?