Ranking drop for duplicate content in regional subdomains

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Ranking drop for duplicate content in regional subdomains

tilmes

10:55 am on Aug 13, 2008 (gmt 0)

I made a site recently with using sub-domains e.g. australia.example.com, uk.example.com. So it targets regional markets. After a month all pages indexed from google found at the very end of search results. It used to found at first page. I guess it is google panelty. How can i get out of google panalty for sure? With robots.txt, meta tags, or coding in .htaccess file?

[edited by: Receptional_Andy at 10:57 am (utc) on Aug. 13, 2008]
[edit reason] Please use example.com - it can never be owned [/edit]

tedster

11:25 am on Aug 13, 2008 (gmt 0)

A couple things

1. New sites often start out with great rankings, then lose those positions after a short period only to build back slowly. It's like a text period to see if the site will really catch fire, I think, and most of the time that doesn't happen. We used to call this the Google Sandbox [webmasterworld.com].

2. As you may have just discovered, you should definitely watch it with regional pages that are too close to duplicates. Sure, you can exclude those subdomains through robots.txt or robots meta tags. Then, if they aren't even indexed, why do you want them online? If they have a real business purpose, then I'd suggest making them more tailored to each region so they are no longer duplicate.

tilmes

12:14 pm on Aug 13, 2008 (gmt 0)

Hello tedster, thanks for your comments. I would try with robots.txt first. I could not find the code in internet. How can i disallow only those sub domains with robots.txt from indexing? Would you be more detail about "making them more tailored"? I would be glad to keep those sub domains because it shows only ads which are related that region.

tilmes

1:10 pm on Aug 13, 2008 (gmt 0)

I still cannot find out how to disallow googlebot from indexing subdomains. Because all subdomains are generated from .htaccess. And there are no sub folders for them.

activeco

1:45 pm on Aug 13, 2008 (gmt 0)

Bots usually call for robots.txt in a subdomain, so generate that file too.

tilmes

7:11 pm on Aug 13, 2008 (gmt 0)

The root directory is same as www.site.com is located. Can have Robots.txt only for newyork.site.com?

activeco

9:43 pm on Aug 13, 2008 (gmt 0)

Probably, if you check request and provide corresponding file.

tedster

2:04 am on Aug 14, 2008 (gmt 0)

The root directory is same as www.site.com is located

That can't really be true, from an http request point of view. What you need is a file at this address: newyork.example.com/robots.txt

tilmes

5:17 am on Aug 14, 2008 (gmt 0)

newyork.example.com is generated from rewrite rule .htaccess. What can i write in robots.txt, allow to index only www.example.com? Because there can be only one robots.txt in root directory for all cities.

tedster

5:53 am on Aug 14, 2008 (gmt 0)

So then use a meta robots noindex tag on all the pages in the subdomains.

tilmes

6:51 am on Aug 14, 2008 (gmt 0)

Thanks for your answer. Do you know if meta noindex tag works really for duplicated contents? Meta noindex tag is added now. Well then have to wait and see?

activeco

8:55 am on Aug 14, 2008 (gmt 0)

Or make additional .htaccess rules:

RewriteCond %{HTTP_HOST} ^([^.]+)\.yoursite\.com$
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^robots.txt$ /norobots.txt [L]

Which should rewrite all requests for robots.txt in any subdomain except www to norobots.txt file.

norobots.txt:

User-agent: *
Disallow: /

tilmes

7:08 am on Aug 15, 2008 (gmt 0)

Hi activeco, great idea. i will try this and will post here if it worked or not. thanks!

tilmes

7:09 am on Aug 22, 2008 (gmt 0)

Very strange here, in google web mastertool,
there are so many URLs restricted by robots.txt.
And amny URLs have this ' ' in html pages. Like http://www.example.com/'pages.html'
They are all prhibited by robots.

[edited by: tedster at 7:21 am (utc) on Aug. 22, 2008]
[edit reason] switch to example.com - it can never be owned [/edit]

activeco

11:42 am on Aug 22, 2008 (gmt 0)

Probably some other rewriting rules eventually combined with bad linking.