homepage Welcome to WebmasterWorld Guest from 54.161.236.229
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
1 robots.txt file across multiple domains
Unable to implement individual robots.txt files - can I just use 1
deadsetchamp




msg:3858718
 10:50 pm on Feb 26, 2009 (gmt 0)

I have a retail site targeting different countries. Unfortunately it is basically the same content but with just different prices. We have different country specific domains but they are all on one server and we are unable to implement different robots.txt. I just want to block out all the sites except so we don't get penalised for duplicate content.

So is it possible to have a robots.txt file us the following code

User-agent: *
Disallow: <our UK domain>/
Disallow: <our AU domain>/
Disallow: <our CA domain>/

Or does robots.txt ignore any domain information and just look at what comes after the /. Very important we don't ruin our US rankings.

 

jdMorgan




msg:3858731
 11:03 pm on Feb 26, 2009 (gmt 0)

> does robots.txt ignore any domain information and just look at what comes after the /.

Yes, only the server-local URL-paths can be specified.

If you have the technology to "change the prices" between domains, you likely also have the technology to serve a different robots.txt per domain... I suspect the right questions are not being asked.

Jim

deadsetchamp




msg:3858735
 11:06 pm on Feb 26, 2009 (gmt 0)

Thanks Jim,

They said that they can't do individual files but might be able to do the meta spider restriction method.

Cheers for the quick reply.

jdMorgan




msg:3858749
 11:23 pm on Feb 26, 2009 (gmt 0)

> they can't do individual files

I'll bet you can easily find someone who *can* do individual files -- Good help is cheap in an economic downturn, something that "no-can-do" people should bear in mind... ;)

Use mod_rewrite or ISAPI Rewrite to internally rewrite robots.txt URL requests to different files based on the Host header sent with the client HTTP request. Or again, use a rewrite engine to pass all robots.txt requests to a PERL or PHP script which can generate different robots.txt content, again based on the Host header sent with the HTTP request. Or build this function into the script you use to generate your custom 404 error page contents, and let the robots.txt requests activate that script as well, with that script producing the robots.txt content (and a proper 200-OK server status response)... There are many ways to do it.

Jim

deadsetchamp




msg:3858800
 1:15 am on Feb 27, 2009 (gmt 0)

Thanks for this!

I know what you mean about getting 'can-do' people. I will pass this on to them and it might kick start their imagination.

g1smd




msg:3871251
 9:29 am on Mar 16, 2009 (gmt 0)

What you need can be done in just a couple of lines of code, as jd has outlined above.

WebmasterWorld uses a similar system to serve a different robots.txt file to different bots.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved