homepage Welcome to WebmasterWorld Guest from 54.197.94.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Controlling a Subdomain with robots.txt
robots.txt for multplie subdomains
ivanvias




msg:4253503
 8:13 pm on Jan 14, 2011 (gmt 0)

Hi,

I have some rules that create virtual subdomains which is working fine.

I have a main domain and a robots.txt file that i would also want to work with every subdomain.


Can anyone assist.

Would this work:

.htaccess

# Uncomment the following line if rewrites are not already enabled
# RewriteEngine on

# Use a special robots.txt file for the subdomain
RewriteCond %{HTTP_HOST} ^*.example.com$
RewriteRule robots\.txt robots.txt [L]

?

 

g1smd




msg:4253520
 9:03 pm on Jan 14, 2011 (gmt 0)

In order to work for any particular hostname, the robots.txt file must appear at subdomain.example.com/robots.txt when accessed from the web.

The code above rewrites a request to itself in an infinite loop. Actually it would never work as the ^*. syntax is invalid.

In a rewrite, the pattern should match the path part of the incoming URL request, and the target should be the physical server path and filename where that content resides.

wilderness




msg:4253522
 9:22 pm on Jan 14, 2011 (gmt 0)

g1smd,
I've explained previously that these regex's are not my forte, and the longer I fail to use them, the even less I understand them.

Would something like this work?
With a robots.txt file previously installed in each subdomain-directory,

RewriteCond %{HTTP_HOST} (.+[^/])/(.+[^/]).example.com
RewriteCond %{HTTP_HOST} !(.+[^/])/(.+[^/]).example.com
RewriteRule robots\.txt [(.+[^...] [L]

Thanks in advance.

Don

g1smd




msg:4253570
 11:51 pm on Jan 14, 2011 (gmt 0)

The question is still a little unclear, but I am guessing you want an incoming request for foo.example.com/robots.txt to be served by the file located at /foo/robots.txt inside the server - where "foo" matches both the sub-domain name and its respective folder - one folder for each sub-domain.

This is different to the original question where I believe you said there would be just one single robots.txt file for all of the sub-domains. Please clarify that, as it makes a vast difference. In phrasing the question, note that URLs used "out there on the web", and filepaths used "inside the server" are not at all the same thing. They are "related" by the actions of the server and its configuration.

ivanvias




msg:4253599
 2:28 am on Jan 15, 2011 (gmt 0)

Yes there is one single robots.txt thats already there that i want to use for all the subdomains.

tangor




msg:4253600
 2:36 am on Jan 15, 2011 (gmt 0)

robots.txt is not a panacea... and a robots.txt that addresses subdomains will have an impact on the top domain... and robots.txt inside subdomains have less than optimal results. Perhaps .htaccess is the better place to address any issues regarding SE crawls of the website in general, and subdomains in particular?

jdMorgan




msg:4253932
 4:29 pm on Jan 16, 2011 (gmt 0)

If you have code that rewrites subdomain requests to subdirectories to implement "multiple subdomains on one server," and you wish to use one single/common robots.txt file for all domains and subdomains, then the answer would be to exclude requests for robots.txt from being rewritten to the subdomain subdirectories.

In other words, change the subdomain rewrite code (which was not posted) from something like

RewriteCond $1 !^subdomain-directories/
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com
RewriteRule ^(.*)$ /subdomain-directories/%1/$1

to something like

RewriteCond $1 !^(robots\.txt$|subdomain-directories/)
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com
RewriteRule ^(.*)$ /subdomain-directories/%1/$1

to exclude any robots.txt requests from being rewritten to the subdomain-specific subdirectories.

Note that you could indeed use an "exclusion rule" above this subdomain-to-subdirectory rewrite if that is what your code was intended to implement. In that case, the proper syntax would have been:

RewriteRule ^robots\.txt$ - [L]

to specify "Do nothing, just quit here if robots.txt is requested."

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved