Unfortunately a lot of good suggestions have been ignored by google.
And believe it or not, this is a canonical issue.
Prevention of proxy IP's visiting your website is a very good idea. This is one of the most important things a webmaster should think about. Allowing proxy browsers to your URL can bring your website to its knees in google. Into oblivion. Probably never to recover ranking status again.
There are many ways to create the conditions that google's method of allocating content (canonicolization) is susceptible to. In other words google uses links that do not have content to apportion content to them. No evidence exists that google has used an existing site to alter that to contain contents of another website. It is always to do with an empty URL. Often a php, cgi, asp, proxy methods of serverside redirect, or html equivalent.
For instance, your website has content in your index page. A redirect has no index or content. Let us assume this is a temporary redirect. Google now has two options. To get the content for the redirect, thereby granting the redirect as the better canonical representative of the contents or to make no change and not apportion the contents to the redirect because it has deemed the site with the contents the better representative of the contents.
I don't believe any webmaster has intentionally achieved a hijacking of any website. This process is entirely due to google and its canonicalization process.
You can indeed create the conditions google needs but you cannot influence a result. You would have to know what criterion and process google determines in order to deliberately hijack a target website.
A mass of targeted sites may result in a hijack or two by maybe 1 in a thousand and no finger would have picked that out other than google deeming that site to be not a canonical worthy website to represent its contents.
Google says that another webmaster cannot harm another websites ranking. This is absurd.
You can indeed tank or elevate another websites rankings. You can bring down a competitor very easily, not overnight, but over a period of time.
Many, many websites have tanked in google because a surfer has visited their website with a proxy browser etc.
I've seen quite a few sites tank in this manner. If you think that you have resolved your website regarding duplicate content, think again because you have not. A visitor via proxy can create duplicate content of your website, so can any inbound temporary redirect. In 99.99% of cases, it is unintentional.
If google deems the residue that can be left behind a proxy to be a better canonical representative of your index or internal page visited, or the temporary redirect as better, then duplication of your website in another URL is unavoidable.
Deepcrawl googlebots do not carry a referer. I think they arrive directly from google instructed to get contents for a single to many links. And this can only be done after harvesting bots have informed google of the residue left by proxies and temporary redirects found in pages of the internet.
And this is very difficult to recover from.
MSN created many duplications and probably now found a way to improve matters. Yahoo came up with a working idea but not a solution. It looks as though it sorted it out but only by virtue that the duplication is not there and we have no idea how the final rankings in yahoo are deternined.
It really is crazy when you think of millions of webmasters spending days, weeks and months to resolve there websites only to find that yet more sinister ways exist that their websites can be duplicated.
Imagine the horror if an unethical webmaster did a sitemap of your entire site and added a proxy redirect before your URL. Every single page of your website is now at the mercy of the proxy. If that webmaster then used an automated software to submit all your pages to thousands of directories and link farms, the possibilities or your website tanking becomes very possible indeed. That webmaster may also be able to submit that deadly sitemap to google with all the pages in it. I'm not sure if google would not accept the sitemap because of the URL's looking they belong to the proxy webmaster.
[edited by: AlgorithmGuy at 9:37 am (utc) on Oct. 2, 2006]