Welcome to WebmasterWorld Guest from 126.96.36.199
After I woke up, google.com was indexing the testboard for almost 8 hours. After thinking of how they found my testboard url, I realized that I had google adsense and urchin code on the testboard. Is there a way to remove what it had indexed? I'm worried about duplicate content. :( Please advise, thanks in advance.
PS I already blocked all spiders in /robots.txt with:
The advice given by G was to block the dual content you don't want to index, otherwise they choose one.
Of they choose for a short period of time the wrong one, remove your testboard to directory three and 301 to the real directory.
In fact why don't do this right now. 301 everything from the testboard to the real one and restart your test in another directory this time removing adsense.
I'm worried about duplicate content.
I wouldn't worry too much about duplicate content in this instance, and certainly wouldn't use a 301. If you've blocked any further crawling on the test board, I believe the pages will eventually go supplemental and then a long time later (a year) they will drop out of the index. In the meantime no harm will be done.
and certainly wouldn't use a 301
How does a 301 hinder crawling? :\
How can Webmasters proactively address duplicate content issues?
# Block appropriately: Rather than letting our algorithms determine the "best" version of a document, you may wish to help guide us to your preferred version. For instance, if you don't want us to index the printer versions of your site's articles, disallow those directories or make use of regular expressions in your robots.txt file.
# Use 301s: If you have restructured your site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, the Googlebot, and other spiders.
I placed the robots.txt inside the testboard folder.
Note that robots.txt can only be placed in the document root for your site, never in a folder - it will never be fetched from there. You should exclude the teastboard folder within your main robots.txt instead.