homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

Are your test environments indexed in Google?
They shouldn't be: test.example.com, staging.example.com etc.

 1:52 pm on Jul 24, 2013 (gmt 0)

I just discovered that a site I'm helping with has thousands of pages from their staging server indexed in Google.

That creates massive amounts of duplicate content that can't be blamed on scrapers!

Check to make sure this isn't happening to you.

Pages can be kept out of the search engine indexes by adding the noindex directive to the <head> ... </head> section:

<meta name="robots" content="noindex">

Don't block spiders in robots.txt if you're doing this. Google et al. will only see the NOINDEX if they're able to spider the page.

Also, remember to remove the noindex directive when you publish the content to the main site.



 4:33 pm on Jul 24, 2013 (gmt 0)

When possible, we try to block access to DEV sites with a white list of IPs. What Google can't read, Google can't index.


 4:40 pm on Jul 24, 2013 (gmt 0)

This has been a big problem in the past: Google discovery of test and development sites.

Googlebot has a voracious appetite, and simply using robots.txt will not work.

It has gotten to the stage where i've become paranoid and I avoid using gmail, and ask clients to make sure they don't have any toolbars installed on their machines.


 5:30 pm on Jul 24, 2013 (gmt 0)

I prefer to use HTTP Basic Authentication to protect development/staging content from indexing.
this gives googlebot a 401 status code.
as suggested by LifeinAsia using a 403 Forbidden status code works just as well.
the meta robots noindex solution is effective for keeping the dev urls out of the index but uses more server resources.


 6:25 pm on Jul 24, 2013 (gmt 0)

to protect development/staging content from indexing

This material is already indexed, alas.

There's no way to know what effect it's having on the main site's SEO, but it can't possibly be a good thing.


 6:28 pm on Jul 24, 2013 (gmt 0)

If you haven't already done so, definitely submit a site removal request for the staging site through Webmaster Tools ASAP.


 9:34 pm on Jul 24, 2013 (gmt 0)

What Google can't read, Google can't index.

You would think so, wouldn't you. But g### thinks differently. Uncrawled pages can still be indexed, even if the index only reveals that the page exists, not what it says.


 12:39 am on Jul 25, 2013 (gmt 0)

Move your test server to a different subdomain and set up HTTP Basic Authentication on it.

On the indexed test subdomain set up a site-wide page-by-page redirect to the main site. Leave the redirect in place for at least 3 months after the last request from anywhere is received.

The "noindex" meta tag is not enough to get you out of trouble. A few years back a company fulfilled several orders before they realised the price paid by the customer was way too low. Turns out the customer had noticed that the prices on the test subdomain were quite old and the site allowed you to place an order!

Global Options:
 top home search open messages active posts  

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved