Page is a not externally linkable
jdMorgan - 3:20 pm on Jun 3, 2010 (gmt 0)
As long as your internal domain-to-folder rewrite is correctly implemented, search engines will have no idea that these site-folders exist... After all, even with a "normal single-Website" hosting set-up, they have no idea what your DocumentRoot path on the server is, and they do not care.
User-agents on the Web (browsers, search engine robots, etc.) work with URLs. They do not "know" about pages, files, server-side scripts, or anything else. Just URLs.
So, your top-level "folder" on this server should be completely inaccessible to them by HTTP URL, because all requests get rewritten to one or another "site folder" below that level. In other words, even if you had a robots.txt file in your top-level folder, no search engine or browser should be able to fetch it, because your code will rewrite the request to a "requested-site -based" subfolder.
Each of those "sites" should contain its own robots.txt, sitemap.xml, search engine "validation key," compact privacy policy, and content-label files.
Anyway, the key here is to keep in mind that a URL is not a filepath, and a filepath is not a URL -- The two are not equivalent in any way, are not necessarily related in any way, and are only "associated" by the URL-to-filepath translation phase of server operation (in which mod_rewrite can play a part).
So if your rewrite code is correct, search engines don't know anything about your folders and files, they only know about the URLs that you (and others) "publish" in links on your pages and through 30x redirect responses.
The only measure I would recommend if there is the slightest chance of a linking error or malicious attention from competitors is to 301 redirect direct client requests for
http://maindomain.com/sitename.com-subfolder/<whatever>
back to
http://sitename.com/<whatever>
The code for that has been posted here many times, and you should be able to find it by searching here for "redirect direct client request RewriteCond THE_REQUEST" using the WebmasterWorld site search or a google "site:www.webmasterworld.com" search
Jim