Forum Moderators: open

Message Too Old, No Replies

Google - robots.txt question

Can I get the best of both worlds?!

         

GrinninGordon

11:52 pm on May 3, 2003 (gmt 0)



As a large % of my users are in Australia, and as many Aussie search engines only list .au sites. A while ago I opened an .au account and put a mirror site there. It basically is the same, and points towards my real site for the "order" forms. Bully for me, this also helps my real site's PR.

I need to update the html of this .au site, and plan to use the same basic html as my real (main) site. Trouble is, I don't want it to do too well and set off a Spam algo! Rather then even contemplate this, I wondered if there was something I could put in the robots.txt file (or something else, somewhere else), that would allow me to keep the PR without having Google index the mirrored site's pages. And without affecting my standing with the Aussie search engines that created the need in the first place.

I suspect I am OK, as many sites have regional copies (including Google). And I think that Google simply downgrades / ignores what it sees as the copy (which is not that hard to work out). But, nervous as ever, I wondered if there was a more assured way.

fathom

12:06 am on May 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



disallow googlebot in your robots.txt.

The better course is making at least 10 - 20% of each page different... this isn't as difficult of a task as it sounds.

A quick adjustment in the nav menus works good, and then over time altering the sentence structure of new pages is all that is needed.

GrinninGordon

12:12 am on May 4, 2003 (gmt 0)



Thanks fathom

But a deny would presumably cost me the PR I get from the mirror. And simply changing the html a tad will probably get me Spam reported if the mirror site then does well (although, based on what I have seen, I do not think Google could object to it).

This is one of the few times I have considered, seriously, using a cloaking script. To send Google directly to the real site. I am not going to, as I am paranoid about cloaks. I had hoped the .htaccess file could be configured to send certain bots elsewhere. Or that there was a meta tag I could use that would stop Google from indexing page (but allow them to keep my luvly PR bonus going).

mcavic

1:20 am on May 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had hoped the .htaccess file could be configured to send certain bots elsewhere

It can, but that would be considered cloaking. And I don't see how it would help you anyway.

I think you have to either block the mirror from Google entirely, or risk duplicate content.

Personally, I don't think Google would penalize you, if it's obviously a regional mirror. They might just drop pages that are exactly the same.

fathom

1:48 am on May 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



agree with mcavic.

It is unlike that Google will ever penalize you with duplicate content > avoid linking "crosslink" same content to same content and you will be find > and no need to add robots.txt, robot no index or .htaccess.

However: look to the future... don't get in the habit of adding new content to both sites "always"... develop some uniqueness between them

e.g. - add a new page to 1 and link to it from the other, then another new page and reverse the link.

While under Google's radar things are great but at some point your sites successfullness will put you in Google's crosshairs, so avoid total duplication from here on out.