Page is a not externally linkable
man_in_poland - 9:31 am on Apr 24, 2008 (gmt 0)
I find this a particular problem because I often like to block a directory with robots.txt for a few days or weeks, and only 'open it up' to search engines once any bugs are fixed - however the proxy server may have got there first, and Google still seems to think that this means that the proxy domain is the original source of the content (based on indexing date, I assume) rather than mine (the original). Has anyone else observed this?
I certainly have noticed the number and ranking of proxy pages has dramatically receeded in recent weeks. The only issue that still seems to remain are pages which are BLOCKED BY ROBOTS.TXT on the original site still being indexed under the proxy server domain (because Google simply looks for the robots.txt file on the proxy server domain).