Forum Moderators: open

Message Too Old, No Replies

Two servers, same content; redirect robots.txt

Please help with .htaccess file syntax

         

klark

1:06 am on Nov 15, 2002 (gmt 0)

10+ Year Member



Hi there-

My organization has several web sites that all share the same document root. For example:

www.domain.com -- main site, this is the one we want users to use and search engines to index

www.qa.domain.com -- QA site - same exact content as the one above

I'd like to allow indexing of www.domain.com and disallow indexing of www.qa.domain.com. Since they both share the same document space, these two URL serve the exact same file:

www.domain.com/robots.txt
www.qa.domain.com/robots.txt

I was thinking that we could use an apache rewrite in the .htaccess file at the root of the server to use the http_host env variable to dictate.

That is -

-- if the user is requesting: www.domain.com/robots.txt, they get one file

-- if the user is requesting: anythingelse.domain.com/robots.txt, they get another file.

Please note that we do not have just two servers/domain names. So, we need to check for www.domain.com only. Everythingelse.domain.com would get the alternate robots.txt.

I am just learning RewriteRule syntax.

can someone help?

Thanks - much appreciated.

bcc1234

3:21 am on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



NameVirtualHost 1.1.1.1:80

<VirtualHost 1.1.1.1:80>
ServerName www.qa.domain.com
RedirectPermanent / [domain.com...]
CustomLog your log here
</VirtualHost>

<VirtualHost 1.1.1.1:80>
ServerName www.domain.com
DocumentRoot your docbase
CustomLog log here, the same file as above
...other crap here...
</VirtualHost>

That will redirect anything and everything to the main domain.
Avoid using .htaccess whenever possible. Use the conf files.

andreasfriedrich

3:47 am on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld [webmasterworld.com] klark.

As far as I understand you both domains serve the same pages and you want to keep it that way. To prevent a penalty for duplicate content you want to disallow spidering of everything but the main domain.

bcc1234 configuration would force a redirect to the main domain. Browsers would update the address in the address bar to your main domain. You might want to consider this solution.

If you donīt want to use the permanent redirect, this might work.

RewriteCond %{REQUEST_URI} ^/robots\.txt$ 
RewriteCond %{HTTP_HOST}!^(www.)?domain\.tld$
RewriteRule .* /other_robots.txt [L]

Andreas

klark

4:18 am on Nov 15, 2002 (gmt 0)

10+ Year Member



Thanks for two very good solutions!

I think your solution should work Andreas, as it matches what I was hoping to do.

I plugged in your code and got an internal server error, so something's up. But, I'll keep investigating.

Thanks again-
Bill

klark

4:31 am on Nov 15, 2002 (gmt 0)

10+ Year Member



Oh wait! It works perfectly. I fell for the missing-space-next-to-the-! problem.

Thanks!

andreasfriedrich

4:44 am on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I fell for the missing-space-next-to-the-! problem.

Hey Bill, you must have been lurking for quite some time to know of that ;)

Glad you got it working though.

Andreas