|google indexing my "mirror" forum|
a test board I setup
My live board is at mydomain.com/forum (main domain is myforum.com). Today, I decided to make a copy of the live board at mydomain.com/testboard (copied the entire mysql database and all the files over). The reason for this is because I want to perform a software upgrade on the testboard first before doing it on the live forums.
After I woke up, google.com was indexing the testboard for almost 8 hours. After thinking of how they found my testboard url, I realized that I had google adsense and urchin code on the testboard. Is there a way to remove what it had indexed? I'm worried about duplicate content. :( Please advise, thanks in advance.
PS I already blocked all spiders in /robots.txt with:
|PS I already blocked all spiders in /robots.txt with: |
I assume that you only want to block your testboard, in your /robots.txt
That's correct, I placed the robots.txt inside the testboard folder. Additionally, I have banned robots on the testboard. It haven't came back in over 3 hours. Do you know what happens to the content that were indexed by google? :-S
Well it gets indexed. I assume you just need to wait now.. otherwise there is an removal tool. But you might just want to wait so you don't remove something that shouldn't be removed.
The advice given by G was to block the dual content you don't want to index, otherwise they choose one.
Of they choose for a short period of time the wrong one, remove your testboard to directory three and 301 to the real directory.
In fact why don't do this right now. 301 everything from the testboard to the real one and restart your test in another directory this time removing adsense.
|I'm worried about duplicate content. |
I wouldn't worry too much about duplicate content in this instance, and certainly wouldn't use a 301. If you've blocked any further crawling on the test board, I believe the pages will eventually go supplemental and then a long time later (a year) they will drop out of the index. In the meantime no harm will be done.
|and certainly wouldn't use a 301 |
How does a 301 hinder crawling? :\
How can Webmasters proactively address duplicate content issues?
# Block appropriately: Rather than letting our algorithms determine the "best" version of a document, you may wish to help guide us to your preferred version. For instance, if you don't want us to index the printer versions of your site's articles, disallow those directories or make use of regular expressions in your robots.txt file.
# Use 301s: If you have restructured your site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, the Googlebot, and other spiders.
|I placed the robots.txt inside the testboard folder. |
Note that robots.txt can only be placed in the document root for your site, never in a folder - it will never be fetched from there. You should exclude the teastboard folder within your main robots.txt instead.
Yup, you should use the robots.txt like:
having it in the actual folder is, as far as I know, worthless.