Best practice: Block with robots.txt or noindex when test platform has leaked

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Best practice: Block with robots.txt or noindex when test platform has leaked

Kanute78

12:00 pm on Nov 11, 2014 (gmt 0)

We have a funny little issue where the dev. test platform was left open for both crawling and indexing, which again obviously are causing us different issues.

What is the best practice for acting on this ( 301 redirect is not possible as the developers need to have the the test platform available)?

Is it to block with robots.txt or is it to use the noindex in the header (and should we submit a site removal in addition to this?)

Thanks for any good pointers in regard of this.

goodroi

2:23 pm on Nov 11, 2014 (gmt 0)

Is the dev test platform a permanent test that the public should never visit? Or is it your new domain that was under development and went live too early?

What problems are you having? Is the test site outranking the existing site? Are consumers getting to the test site which isn't fully functional?

netmeg

2:40 pm on Nov 11, 2014 (gmt 0)

The quickest way to get it out of the index (this happens to me with developers A LOT) is to implement NOINDEX sitewide on the dev site, add the dev site to your GWT, and then remove the whole thing. The removal only lasts 90 days, but if there's a NOINDEX on it, if Google comes back they won't index it. And if you don't want them to crawl it, that's when you block in robots.txt too. But you need to do these things in a particular sequence -

NOINDEX
Add to GWT
Remove URLs
Block in robots.txt

As general best practice, I usually insist that the developers either password protect or only allow our IP numbers when they set up a dev site, since a NOINDEX won't keep your competitors from playing around in it if they happen to discover it.

(Unfortunately the insisting part doesn't always work)

That should do it.

Kanute78

2:43 pm on Nov 11, 2014 (gmt 0)

It is a permanent dev test platform, which is not intended for the public.

The timing of this issue was very unlucky, as we got caught up in the algo updates 19-25th of October, with the proper domain dropped massively the 25th of October. (There are no analytics set up for the dev site, so do not have any overview if it has taken over the traffic or not)

I guess both backlinks and content could be the issue for the drop we experienced (Thinking about the Penguin update, and if it is correct as algoroo reported, 25th as a thin content/content update), as we suddenly had 15k follow exact match anchor links, and 100% duplicate content, and of course just to top it of, a nice increase in 404s and sitemap issues.

Kanute78

2:49 pm on Nov 11, 2014 (gmt 0)

Thank you Netmeg, seems like a very good approach to do it.

The test domain should be password protected in the first place, but this was never done, so this was the core for the problems.

In regards of the developers, we do have a checklist on their wall, but, it doesn't always work either.

netmeg

4:04 pm on Nov 11, 2014 (gmt 0)

I don't know that this would be the cause of your drop, but it's still a good idea to get it out of the index either way.

tangor

5:35 pm on Nov 11, 2014 (gmt 0)

(There are no analytics set up for the dev site, so do not have any overview if it has taken over the traffic or not)

You do have logging enabled, correct? That will give you answers in that regard.

Meanwhile, developers who don't protect a test site deserve more than a wrist slap... I'd hit them in the pocketbook.

netmeg offers good advice to get off index quicker, but I'd do if faster with deny

Sad fact is due to bad practice the dev machine was found and, as has been said too many times: Once on the web, forever on the web! Goggle cache will have the site, even AFTER you do all suggested.

Is your dev platform internal, or does it require out-facing interest to operate? If the latter, change that immediately.

engine

5:36 pm on Nov 11, 2014 (gmt 0)

This problem occurs all too easily, and I learned my lesson a long while back. You just have to keep the dev server behind closed doors, and even then, be careful. It's often been suggested that Google can find "stuff" through various means, and this speculation has included links in gmail, toolbar feedback, and just the off-chance of googlebot finding a single link.

Once it's been found, Google has a voracious appetite for new links. Getting it to forget is a slog, and netmeg's suggestions are good ones. You have to hope that Google is not employing the dev server data its found to re-rank another site. It shouldn't easily if it's on an entirely different domain and IP.

You may want consider setting up an entirely new dev server and starting afresh.

In my instance, it took almost eight months to clean up everything, including removing all references to the dev site.

In the meantime, be patient with the dev server pages coming out of Google's index.

netmeg

5:41 pm on Nov 11, 2014 (gmt 0)

netmeg offers good advice to get off index quicker, but I'd do if faster with deny

Not entirely sure what this means exactly, but if you mean with robots.txt, that obviously won't take it out of the index (nor prevent it from being indexed)

Neil, if you request removal for the entire site, it's gone. I haven't even found cache remaining after I've requested removal. Obviously though if someone gets a chance to link to it (yea that's happened too, sigh) it's a different story.