I have this question. Let's say I have a domain called: www.domainx.com
Let's say that:
- I have an htaccess rewrite (hidden redirect) in order to display the website stored under the folder /websitex whenever an http request gets www.domainx.com That is to say, if you go to www.domainx.com the url will not change. However, the content displayed in the browser will be really the one stored under the folder called "websitex". In other words, the browser displays the same content as if you had typed www.domainx.com/websitex
- I do not want search engines to index the contents under the root folder of www.domainx.com. I just want them to index www.domainx.com and the contents under www.domainx.com/websitex
Is there any way to achieve this through the use of a robots file?
I would try something like this but I am not sure:
My concern is that I am not sure whether this would prevent www.domainx.com from being listed by search engines or not. Actually I would like www.domainx.com to be listed by search engines.
As long as your internal domain-to-folder rewrite is correctly implemented, search engines will have no idea that these site-folders exist... After all, even with a "normal single-Website" hosting set-up, they have no idea what your DocumentRoot path on the server is, and they do not care.
User-agents on the Web (browsers, search engine robots, etc.) work with URLs. They do not "know" about pages, files, server-side scripts, or anything else. Just URLs.
So, your top-level "folder" on this server should be completely inaccessible to them by HTTP URL, because all requests get rewritten to one or another "site folder" below that level. In other words, even if you had a robots.txt file in your top-level folder, no search engine or browser should be able to fetch it, because your code will rewrite the request to a "requested-site -based" subfolder.
Anyway, the key here is to keep in mind that a URL is not a filepath, and a filepath is not a URL -- The two are not equivalent in any way, are not necessarily related in any way, and are only "associated" by the URL-to-filepath translation phase of server operation (in which mod_rewrite can play a part).
So if your rewrite code is correct, search engines don't know anything about your folders and files, they only know about the URLs that you (and others) "publish" in links on your pages and through 30x redirect responses.
The only measure I would recommend if there is the slightest chance of a linking error or malicious attention from competitors is to 301 redirect direct client requests for
The code for that has been posted here many times, and you should be able to find it by searching here for "redirect direct client request RewriteCond THE_REQUEST" using the WebmasterWorld site search or a google "site:www.webmasterworld.com" search