homepage Welcome to WebmasterWorld Guest from 54.242.200.172
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt for mirror sites
Anyone have a good one?
ScottM




msg:1528426
 8:42 pm on Aug 6, 2002 (gmt 0)

I have some extra domains that are not being used at the moment.

I'd like to put some mirrors up for the time being. But I do NOT want them banned from any IMPORTANT SE's..

Does anyone have a nice robots.txt they can share with me via sticky-mail?

Also: are there any other pitfalls of mirror sites?

How about the 'pros' (versus cons)?

Thanks.

 

jdMorgan




msg:1528427
 8:59 pm on Aug 6, 2002 (gmt 0)

ScottM,

On a mirror site I have for use if the primary server fails, I use

User-agent: *
Disallow: /

This disallows all robots from indexing the duplicated content. I do not promote the mirror site
in any way, except for a single link on the main site on a page that is also disallowed. The mirror
is for back-up purposes only.

No banned-due to duplicate-content problems for me!

Jim

ScottM




msg:1528428
 9:22 pm on Aug 6, 2002 (gmt 0)

I should rephrase my original premise:

I do not want the ORIGINAL sites banned from any SE. The dups/mirrors could be banned, but if they're not indexed, then they can't be banned, but if they get indexed...it's not a problem. They're kinda throwaway domains anyway.

I do NOT want my original website/websites banned.

pageoneresults




msg:1528429
 9:33 pm on Aug 6, 2002 (gmt 0)

> I do NOT want my original website/websites banned.

Best advice would be to not put up the mirror sites. If you are not extremely careful in setting up the robots.txt and other server side issues, you are at high risk.

If there is no real purpose for the mirrors, than don't do it. What for? Type in traffic? If so, set up a 301 redirect for that domain. Anything other than that and you may be walking on thin ice.

As soon as you link from one of those to another, you've opened up another can of worms. Googlebot is very good at following robots.txt when they are set up properly. Google is also very good about finding links in places you may have forgotten about. Or, links in places that someone may have planted! ;)

jdMorgan




msg:1528430
 10:17 pm on Aug 6, 2002 (gmt 0)

ScottM,

Yes, put the disallow only on the mirror sites, not the main site.

If Google (for example) indexes a page on your mirror site, and (for some reason*) assigns it a
higher PR than the page on your main site, it might drop the main site page and list the mirror page
instead - probably not what you want. Some here say that if it finds too much duplicated content or
too much cross-linking then it will give you a PR0. Some say that won't happen, some say it will.

I'm not sure, but be careful.

Jim

* This could happen if a competitor intentionally linked his very-high-PR page to your mirror page,
as pageoneresults implied.

ScottM




msg:1528431
 10:30 pm on Aug 6, 2002 (gmt 0)

Ok...maybe some more explanation would help...

The 'extra' domains would have lots of dup content from some subpages on my main domain.

For example:

www.mysite.com/widgets/blue.htm is original

www.blue-widget.com is an extra domain.

Should I just mix up the content? There is no particular order...it's just a regional directory.

I'm sorry if I'm being elusive...I'm have reasons.

jdMorgan




msg:1528432
 10:38 pm on Aug 6, 2002 (gmt 0)

ScottM,

Duplicate content is a risk, as outlined above.

We try hard to be objective and non-judgemental here on WebmasterWorld - we count e-mail spammers
and anti-spam crusaders amongst our members, as one example. You will have to figure out how far to
open the kimono in order to get an accurate answer. This is my personal opinion, and worth what it's
costing you, but the consensus seems to be that duplicate content is risky and buys you little. YMMV

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved