homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Multiple Websites block through Robots?
Mansi Padhya

 4:54 pm on Jan 31, 2013 (gmt 0)

My four domains are pointing to one root directory, I want to block three domains from them (which have duplicate content) through robots. I want to use this option instead of putting Noindex tag.

Anybody can help me that how to specify or coding in the robots.txt which domain to ignore?



 5:18 pm on Jan 31, 2013 (gmt 0)

The robots.txt file for a site must appear at the URL example.com/robots.txt

All directives in that file apply only to the currently requested hostname.

URLs are used out on the web. Paths and files are used only inside the server. They are related merely by the server configuration.

You will need four robots.txt files: robots-mainsite.txt, robots-thissite.txt, robots-thatsite.txt and robots-othersite.txt in your root folder.

These are the internal filenames used only inside the server. You then rewrite requests for example.com/robots.txt based on the requested hostname in order to fetch the right file.

RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+)\.com$
RewriteRule ^robots\.txt$ /robots-%2.txt [L]

If you have a non-www to www canonicalisation rule you must also add
RewriteCond %{REQUEST_URI} !^/robots
to it otherwise a request for example.com/robots.txt will be rewritten internally, and then the new internal path will be exposed as a new URL (www.example.com/robots-example.txt) back out on to the web by the non-www/www redirect.

Be sure you know the difference between a redirect and a rewrite. Both are coded using RewriteRules.

Rewrites do not "make URLs for files". Instead, the process is "exactly backwards": a rewrite examines the requested URL and then fetches the right file based on that request.

The alternative method is to rewrite requests for robots.txt to instead fetch a single robots.php file internally. Inside the robots.php file you then have a bit of logic that examines the requested URL and then sends the right reply based on which hostname was requested.

It's your choice which method to use. The first method needs mod_rewrite. The second method needs both mod_rewrite and PHP.

[edited by: goodroi at 8:57 pm (utc) on Jan 31, 2013]

Mansi Padhya

 5:47 pm on Jan 31, 2013 (gmt 0)

Thank you for your quick reply & detailed explanation. It seems little complicated to apply this method.


 6:59 pm on Jan 31, 2013 (gmt 0)

>>It seems little complicated to apply this method.

actually IMO you are making things complicated by running 4 websites from 1 root folder - in the long run.


 7:07 pm on Jan 31, 2013 (gmt 0)


I beg to differ - a robots file for each site and a single two-line rewriterule to select the right one is probably one of the simplest bits of code going.

If you already have mod_rewrite enabled, you should be able to have it all up and running in a few minutes.

Mansi Padhya

 7:14 pm on Jan 31, 2013 (gmt 0)

yes that we have already. But now I have to find the solution. So each domain will have separate robots file? Content should be:
User-agent: *
Disallow: /

on those three domain?
& One domain has allow.

Then I apply this code into .htaccess file?

RewriteCond %{HTTP_HOST) ^(www\.)?([^.]+)\.com$
RewriteRule ^robots\.txt$ /robots-%2.txt [L]

Or if I put noindex tag on three domain is will work?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved