Forum Moderators: open

Message Too Old, No Replies

Banned by Google

Development server appears to have been mistaken for duplicate content

         

GaryK

5:37 pm on Jul 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope I've picked the right forum for this. It's not really a cloaking issue but this forum's charter made this forum seem like the best spot to post about the following problem.

I have a development server that is occasionally open to the Internet so clients can have a look at it.

Google paid a visit to my development server while I was working on a website for someone and now that person's production website, which used to rank #3 in the SERPs for its single keyword has been dropped from Google. I'm assuming it was due to duplicate content of which there was a lot at that point. The project was more about changing the design than the content since the content, and several dozen high-quality back-links seemed to be responsible for the #3 ranking.

Will disallowing Google in robots.txt help prevent the above problem from recurring? Would a firewall rule that keeps Google out be the best solution? What else might make a difference so this doesn't happen to other clients of mine?

Thanks in advance.

mcavill

5:43 pm on Jul 24, 2004 (gmt 0)

10+ Year Member



Yes - I'd think robots.txt, a firewall, and perhaps a (301?) redirect if it's google from your dev server would sort it out in the future - I guess it's google picking up URL's for the toolbar / other oddities if it's not a problem with other clients...

I'm not sure how best to do it, IMHO I'd try and serve google a 404 on it's next visit to resolve the potential duplicate content issue.

uncle_bob

10:54 pm on Jul 24, 2004 (gmt 0)

10+ Year Member



if its your dev server, then you may as well use robots.txt to disallow ALL robots. Alternatively, require a logon, then only you and your clients will have access.

GaryK

7:47 pm on Jul 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the suggestions. Especially about requiring a login. That seems like the safest way to handle it in case I get some crawlers that don't obey robots.txt. Plus it adds an extra layer of security in addition to my router's NAT, the IDS I installed in the router (the Linksys WRT54G router can be hacked to run Linux apps on the router's RAM disk), and my hardware firewall.