Welcome to WebmasterWorld Guest from 54.242.83.7

Forum Moderators: phranque

Message Too Old, No Replies

Are your test environments indexed in Google?

They shouldn't be: test.example.com, staging.example.com etc.

     
1:52 pm on Jul 24, 2013 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5653
votes: 58


I just discovered that a site I'm helping with has thousands of pages from their staging server indexed in Google.

That creates massive amounts of duplicate content that can't be blamed on scrapers!

Check to make sure this isn't happening to you.

Pages can be kept out of the search engine indexes by adding the noindex directive to the <head> ... </head> section:

<meta name="robots" content="noindex">

Don't block spiders in robots.txt if you're doing this. Google et al. will only see the NOINDEX if they're able to spider the page.

Also, remember to remove the noindex directive when you publish the content to the main site.
4:33 pm on July 24, 2013 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5627
votes: 47


When possible, we try to block access to DEV sites with a white list of IPs. What Google can't read, Google can't index.
4:40 pm on July 24, 2013 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:23249
votes: 358


This has been a big problem in the past: Google discovery of test and development sites.

Googlebot has a voracious appetite, and simply using robots.txt will not work.

It has gotten to the stage where i've become paranoid and I avoid using gmail, and ask clients to make sure they don't have any toolbars installed on their machines.
5:30 pm on July 24, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10563
votes: 15


I prefer to use HTTP Basic Authentication to protect development/staging content from indexing.
this gives googlebot a 401 status code.
as suggested by LifeinAsia using a 403 Forbidden status code works just as well.
the meta robots noindex solution is effective for keeping the dev urls out of the index but uses more server resources.
6:25 pm on July 24, 2013 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5653
votes: 58


to protect development/staging content from indexing


This material is already indexed, alas.

There's no way to know what effect it's having on the main site's SEO, but it can't possibly be a good thing.
6:28 pm on July 24, 2013 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5627
votes: 47


If you haven't already done so, definitely submit a site removal request for the staging site through Webmaster Tools ASAP.
9:34 pm on July 24, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13218
votes: 348


What Google can't read, Google can't index.

You would think so, wouldn't you. But g### thinks differently. Uncrawled pages can still be indexed, even if the index only reveals that the page exists, not what it says.
12:39 am on July 25, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Move your test server to a different subdomain and set up HTTP Basic Authentication on it.

On the indexed test subdomain set up a site-wide page-by-page redirect to the main site. Leave the redirect in place for at least 3 months after the last request from anywhere is received.

The "noindex" meta tag is not enough to get you out of trouble. A few years back a company fulfilled several orders before they realised the price paid by the customer was way too low. Turns out the customer had noticed that the prices on the test subdomain were quite old and the site allowed you to place an order!
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members