Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google, your nosey crawler indexed my development server

         

realmaverick

1:26 am on Mar 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Today, I noticed Google had indexed my developer server. It's a subdomain of my main domain. It's NEVER referenced anywhere. Not sure how it's found it.

But it's an EXACT copy of the live site, used for development. And guess what, it's indexed TONS and TONS of the pages.

I'm just gonna change the name of the folder so it's gone and then password protect from their prying eyes.

Maybe I was silly not password protecting it in the first place. We live and learn!

realmaverick

1:30 am on Mar 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



actually I don't know what to do right now. 95,100 pages indexed from the dev server. Hell, it has no freaking page rank!

Should I rename the folder and kill it or 301 the pages to the real content?

g1smd

1:39 am on Mar 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would rename the development subdomain and secure that with .htpasswd or similar.

I would then recreate the old subdomain at the old address and 301 the whole lot to the real site to retain the traffic.

[edited by: g1smd at 1:42 am (utc) on Mar 30, 2011]

nuthin

1:41 am on Mar 30, 2011 (gmt 0)

10+ Year Member Top Contributors Of The Month



robots.txt disallow the sub domain.

Our development team had the same issues here with the demo version of sites getting indexed through the sub-domains they listed the demo sites under.

Once we put in place a robots.txt on the sub domain, Google gradually started removing them.

realmaverick

1:47 am on Mar 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks guys, much appreciated. My gut feeling was to 301 the lot and then recreate under a new SD and PW protect it.

Google is a monster.

supercyberbob

2:06 am on Mar 30, 2011 (gmt 0)

10+ Year Member



Damn dirty apes. Leave our development servers alone.

Maurice

8:23 am on Mar 30, 2011 (gmt 0)

10+ Year Member



@supercyberbob the leason here is <b>NEVER</b> put dev or test versions on a subdomain of your live site.

Also <b>NEVER</b> allow dev and test servers to be acessed fom teh internet.

deadsea

9:02 am on Mar 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This has happened to us at least 10 times that a dev server is misconfigured to be public facing and Googlebot starts crawling it.

We have tried to stay away from robots.txt disallow because it is a setting that we don't want to get to the live site

We are implementing the canonical tag such that every canonical url on the live site has a canonical tag back to itself. Then if google ever finds a dev server again, it should get pointed right back to the live site based on the canonical tags. The live site will be just like dev sites which is good for dev and quality assurance.

FranticFish

9:06 am on Mar 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Takes 5 min to set up htpasswd. No-one ever forgets to take that off when rolling out to live, but people can and do forget to remove the 'Disallow: /' from the robots txt.

walkman

9:56 am on Mar 30, 2011 (gmt 0)



"Takes 5 min to set up htpasswd. No-one ever forgets to take that off when rolling out to live, but people can and do forget to remove the 'Disallow: /' from the robots txt. "

I did that, and Noindex and robots, just in case :). maybe an overkill but time well spent

netmeg

2:03 pm on Mar 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Um. I have put together a lot of dev servers and sites over the years, and the very first thing I do, before I even put data in, is wall it off from the world, including Google. Just sayin'.

Don't blame Google, this one was your mistake.

walkman

2:23 pm on Mar 30, 2011 (gmt 0)



On my webmastercentral section Google shows me how fast my /admin/ pages load. How do they know about absolutely protected pages? Toolbar.