| 9:02 pm on Dec 24, 2006 (gmt 0)|
|PS I already blocked all spiders in /robots.txt with: |
I assume that you only want to block your testboard, in your /robots.txt
| 9:15 pm on Dec 24, 2006 (gmt 0)|
That's correct, I placed the robots.txt inside the testboard folder. Additionally, I have banned robots on the testboard. It haven't came back in over 3 hours. Do you know what happens to the content that were indexed by google? :-S
| 9:26 pm on Dec 24, 2006 (gmt 0)|
Well it gets indexed. I assume you just need to wait now.. otherwise there is an removal tool. But you might just want to wait so you don't remove something that shouldn't be removed.
The advice given by G was to block the dual content you don't want to index, otherwise they choose one.
Of they choose for a short period of time the wrong one, remove your testboard to directory three and 301 to the real directory.
In fact why don't do this right now. 301 everything from the testboard to the real one and restart your test in another directory this time removing adsense.
| 9:38 pm on Dec 24, 2006 (gmt 0)|
|I'm worried about duplicate content. |
I wouldn't worry too much about duplicate content in this instance, and certainly wouldn't use a 301. If you've blocked any further crawling on the test board, I believe the pages will eventually go supplemental and then a long time later (a year) they will drop out of the index. In the meantime no harm will be done.
| 1:01 am on Dec 25, 2006 (gmt 0)|
|and certainly wouldn't use a 301 |
How does a 301 hinder crawling? :\
How can Webmasters proactively address duplicate content issues?
# Block appropriately: Rather than letting our algorithms determine the "best" version of a document, you may wish to help guide us to your preferred version. For instance, if you don't want us to index the printer versions of your site's articles, disallow those directories or make use of regular expressions in your robots.txt file.
# Use 301s: If you have restructured your site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, the Googlebot, and other spiders.
| 1:47 am on Dec 25, 2006 (gmt 0)|
|I placed the robots.txt inside the testboard folder. |
Note that robots.txt can only be placed in the document root for your site, never in a folder - it will never be fetched from there. You should exclude the teastboard folder within your main robots.txt instead.
| 9:04 pm on Dec 26, 2006 (gmt 0)|
Yup, you should use the robots.txt like:
having it in the actual folder is, as far as I know, worthless.