homepage Welcome to WebmasterWorld Guest from 50.17.176.149
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
robots.txt
thosecars82




msg:4146265
 1:29 pm on Jun 3, 2010 (gmt 0)

Hello
I have this question. Let's say I have a domain called: www.domainx.com
Let's say that:
- I have an htaccess rewrite (hidden redirect) in order to display the website stored under the folder /websitex whenever an http request gets www.domainx.com That is to say, if you go to www.domainx.com the url will not change. However, the content displayed in the browser will be really the one stored under the folder called "websitex". In other words, the browser displays the same content as if you had typed www.domainx.com/websitex

- I do not want search engines to index the contents under the root folder of www.domainx.com. I just want them to index www.domainx.com and the contents under www.domainx.com/websitex

Is there any way to achieve this through the use of a robots file?
I would try something like this but I am not sure:
User-Agent: *
Disallow: /
Allow: /websitex

My concern is that I am not sure whether this would prevent www.domainx.com from being listed by search engines or not. Actually I would like www.domainx.com to be listed by search engines.
Any idea?
Thanks

 

jdMorgan




msg:4146328
 3:20 pm on Jun 3, 2010 (gmt 0)

As long as your internal domain-to-folder rewrite is correctly implemented, search engines will have no idea that these site-folders exist... After all, even with a "normal single-Website" hosting set-up, they have no idea what your DocumentRoot path on the server is, and they do not care.

User-agents on the Web (browsers, search engine robots, etc.) work with URLs. They do not "know" about pages, files, server-side scripts, or anything else. Just URLs.

So, your top-level "folder" on this server should be completely inaccessible to them by HTTP URL, because all requests get rewritten to one or another "site folder" below that level. In other words, even if you had a robots.txt file in your top-level folder, no search engine or browser should be able to fetch it, because your code will rewrite the request to a "requested-site -based" subfolder.

Each of those "sites" should contain its own robots.txt, sitemap.xml, search engine "validation key," compact privacy policy, and content-label files.

Anyway, the key here is to keep in mind that a URL is not a filepath, and a filepath is not a URL -- The two are not equivalent in any way, are not necessarily related in any way, and are only "associated" by the URL-to-filepath translation phase of server operation (in which mod_rewrite can play a part).

So if your rewrite code is correct, search engines don't know anything about your folders and files, they only know about the URLs that you (and others) "publish" in links on your pages and through 30x redirect responses.

The only measure I would recommend if there is the slightest chance of a linking error or malicious attention from competitors is to 301 redirect direct client requests for
http://maindomain.com/sitename.com-subfolder/<whatever>
back to
http://sitename.com/<whatever>

The code for that has been posted here many times, and you should be able to find it by searching here for "redirect direct client request RewriteCond THE_REQUEST" using the WebmasterWorld site search or a google "site:www.webmasterworld.com" search

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved