TheOptimizationIdiot - 12:28 pm on Apr 29, 2013 (gmt 0)
I'm not too happy with Google and the duplication in the results from scrapers, but I do think in fairness I should point out:
Why a Google owned proxy is allowed to cache anothers content is disturbing.
Unless you're serving a cache-control header for the document(s) or have a past expiration date set in the expires header then documents are often cacheable by proxy servers according to RFC2612.
Allowing caching is not a "Google proxy issue", it's them following protocol and people not using the tools available within the protocol to stop proxy caching. If you're using the w3.org protocols as specified to prevent caching and they're still doing it, then it's an issue on their end, but if not, then it's not their fault people don't know what they're doing or how to prevent caching.
Also, a proxy "injecting" something like a "noindex" header is absolutely against protocol and is a very slippery slope, so they're definitely correct to not do it for any reason.
And, the chances of even cache control stopping someone from thieving content are not very good, because to have the newest version when it's available they really have to be revalidating the request, but it's still not Google fault for running their system according to protocol, it's the fault of the thieves using their system.
* All that said: Indexing the content and allowing it to display over the originator is Google's fault and they've been doing it for years, which imo is something they should have figured out how to correct long ago, but as it's been said before, it's their search engine and they can do as they please with it, so if making the efforts necessary to show the originator isn't something they're interested in doing, then that's their decision.