Forum Moderators: mack
the only way a SE can browse your site is via links to your files. if there's no link to the file anywhere, it's considered non-existant to the engines. the only other way to find your files would be to search the directory for every file. that would incur a TON of resources by every engine, which would be totally impractible. i think it would also breach some security laws.
-Matt
It is AFAIK not certain whether surfing to a page with the Google toolbar tells Googlebot to visit it or not.
If you have external links on that page to sites that have their log files public, it can be found as a referrer there and thus spidered.
If you do want it crawled, put some internal and external links to it.
If it's a page just for your friends, it can still eventually get crawled, what I refer to as "engine magic."
What is the purpose of the orphan page? If it is "for your eyes only" put a sign in function on your site to access the page.
WFN:)
So, I was assuming that if I had a page and put it in the root along with other files, it would still get indexed by the SE bots/spiders since the bots would first seek the root directory and go from that point to every other pages. If I didn't want my file to get indexed, I could still add a line or two of some sort of commands in that file that prevents the bots from crawling, couldn't I?
<meta name="robots" content="noindex,nofollow">
You can also add to your robots.txt file to dissallow.
In the end though with rouge spiders/bots and curious people (like myself) I would recomend the same as above. Remove IUSR (If on a Wins server) or modify your .htaccess (If on Apache). That way it is restricted on read access.
Brian
When I built shtml pages, I inserted a <!--#include virtual="note.html" --> line on every page.
This is what is called server side includes (SSI) This basically means that you include the contents of note.html on to every page. This is done on the server level so that as far as any spiders would see note.html is just another part of the original file, hence following all the links from it.
As for the Original question, WFN and Mohamed_E were spot on, password protecting and robots.txt exclusioon will significantly reduce the risk of the page being spidered but the only true way to ensure it would be never to publish it online in the first place.