Help! Dev Site Was Indexed

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Help! Dev Site Was Indexed

Both showing in index

Phil_AM

7:31 pm on May 30, 2006 (gmt 0)

Just going through our pages in the G index and I just now realized that G has about 50 pages indexed from our dev site.

For example, my site is www.widget.com. We set up our testing grounds as http://development.example.com.

Now, when I do site command, I see some pages from the regular www directory AND some from the development directory.

I'm assuming this is an issue, but I can't redirect our development pages to our www pages as they are separate landing pages.

Any guidance here?

[edited by: tedster at 9:30 pm (utc) on May 30, 2006]
[edit reason] use example.com [/edit]

nippi

12:42 am on May 31, 2006 (gmt 0)

put robots.txt file in your dev domain telling all robots to go away

Phil_AM

2:18 am on May 31, 2006 (gmt 0)

Thanks. Will G overwrite the previously indexed pages, if I disallow googlebot at this point?

nippi

2:47 am on May 31, 2006 (gmt 0)

google will remove them

jwc2349

10:42 am on May 31, 2006 (gmt 0)

I had the same problem in December 2005 and received a duplicate content penalty from Google. My rankings fell from #1 to #30-50 for my targeted search terms and are still there. Business has fallen 99.5%.

It is still in effect and is approaching 6 months. Filing reinclusion requests, emails to G, etc. have done absolutely no good. The duplicate content penalty appears to be automatic (for only 6 months hopefully).

So immediately, disallow the dev site in your robots.txt file and also put up noindex, nofollow metas. And pray that it is not too late. Unfortunately, it very well may be, based on my personal experience.

Send me a sticky mail once things sort out.

stakaman

1:48 pm on May 31, 2006 (gmt 0)

jwc2349 you should use the URL removal tool, that should get the pages out of the index for 6 months.

Just do a search for Google removal tool. Be really careful when you remove pages though, you don't want to do a mistake.

If the problem persists email Google. They do respond sometimes.

pageoneresults

1:53 pm on May 31, 2006 (gmt 0)

So immediately, disallow the dev site in your robots.txt file and also put up noindex, nofollow metas. And pray that it is not too late. Unfortunately, it very well may be, based on my personal experience.

Unfortunately the above solution will not work. Avoid using the robots.txt file and drop a Robots META Tag on those pages you don't want indexed right after the opening <head>...

<head>
<meta name="robots" content="none">

or, the long version...

<head>
<meta name="robots" content="noindex, nofollow">

You can also put the /dev/ into a password protected environment. ;)

Phil_AM

4:30 pm on May 31, 2006 (gmt 0)

Thanks everyone.

pageoneresults,

why would that solution not work?

jatar_k

4:32 pm on May 31, 2006 (gmt 0)

putting a password on all dev sites or finding some other way to not have them open to the outside world is the best way to go

pageoneresults

4:36 pm on May 31, 2006 (gmt 0)

why would that solution not work?

In short, when you include an instruction in your robots.txt file to Disallow: a particular page and/or directory, Google will index the URI only.

Removing that instruction and then using the Robots META Tag allows Googlebot to read an instruction that is more specific, in this case, noindex and no follow which keeps the page (and URI only listing) out of their index.

Phil_AM

4:39 pm on May 31, 2006 (gmt 0)

thanks