I don't think "proxy" is a good term to describe this. There's no problem with OP's site, and I don't think it is targeted in particular.
Andy, we may be seeing different parts of the same problem, but I do believe that a proxy is involved. The "attacked" site, though, isn't what is getting proxied. I believe in this case that it's Googlebot that's getting proxied. The attacked site when accessed directly and not via Googlebot should appear to be perfectly normal.
First... here are parts of tedster's and jdMorgan's descriptions from the above-cited 2007 proxy hijack thread. I'm leaving a lot out, and my emphasis is added below. I recommend reading the whole thread... Proxy Server URLs Can Hijack Your Google Ranking... https://www.webmasterworld.com/google/3378200.htm
First thing I want to clarify -- as I understand your post, you are talking not about someone directly hijacking your traffic through some kind of DNS exploit. You're also not talking about someone strealing your content, although that can play into this picture at times. jdMorgan...
Instead, you are talking about a proxy domain that points to yours taking over your position in the SERPs, sometimes a position that your url has held for a long time.
Given an understanding of what a proxy *is* and how it works, the only step really needed is to verify that user-agents claiming to be Googlebot are in fact coming from Google IP addresses, and to deny access to requests that fail this test.
If the purported-Googlebot requests are not coming from Google IP addresses, then one of two things is likely happening:
1) It is a spoofed user-agent, and not really Googlebot.
2) It *is* Googlebot, but it is crawling your site through a proxy.
The latter is how sites get 'proxy hijacked' in the Google SERPs -- Googlebot will see your content on the proxy's domain.
Here's the translated description of the problem quoted from the Search Engine Roundtable article that I cite in our February 2016 thread.... My site's being de-indexed and replaced by others Feb, 2016 https://www.webmasterworld.com/google/4790240.htm
[webmasterworld.com] The quoted Polish webmaster reporting this problem said...
We (owners of "website B") are not hacked by him, he just copies code of our website and puts in the iframe on "website A". Now, the problem is that Google algorithm in many cases considers the malicious copy put by hacker on website A as THE ORIGINAL and the website B (our website which is the original) disappears from Google results.
The situations we're seeing have nothing to do with the payload or collateral damage in the serps. These payloads are varied, IMO to obfuscate the methodology and the motive. So in the above Feb thread, there were no iframes. There were some bizarre tricks to throw us off the scent. I feel that hidden among these, among other things, are pr0n and malware, whatever is opportunistic for the hijacker.
Several new users posted specifics, which unfortunately we needed to deleted, because we don't out sites and because in some cases the serps were dangerous. User joncmac, who had been posting also in the Google threads, mentioned "proxy hack", which rang a big bell for me, and saved a lot of further speculation.
That was consistent with reports on the cited SE Roundtable discussion and with what I saw of the sites that had been brought to the mods' attention here, and with our 2007 proxy hijacking discussion noted above. What was particularly consistent was...
a) that the original website appeared to be intact when you navigated to it directly
b) that the new results replaced the old website in Google serps
c) that a DNS report (which I ran on one set of results) showed no problems, so it wasn't DNS hijacking
d) that viewing the duplicated page as Googlebot showed the hijacked page's content
aakk9999 reported on both the original hijacked url and the obfuscation of the page receiving the content. Again, a referrer was necessary for all this to happen, and there was no consistency in payload. I did a quick and dirty check to determine that this only happened on Google, and saw that the page's ranking was normal on Bing.
One poster, glutimax, did report that blocking the hijacker's domain/IP in their .htaccess fixed the problem.
So, I'm assuming that, whatever else is done with the hijack, the hijack originates as proxy crawler spoofing Googlebot and intercepting the content and ranking signals of the site.
With regard to canonical hijacking... I think I know the types of domain "directory" sites described... where they "review" the site and will often rank above penalized sites for the domain name, as well as for meta description, title, and some primary content from the site.
Up till now, I'd assumed that these pretty much preyed on penalized sites, include sites hit by Panda and Penguin. This episode, I'm thinking, suggests that they might also be hitting proxy hijacked sites, and that this may or may not be coincidental. It could well be coordinated with the Googlebot crawl through a proxy. The intercepting site would have all the scraped content... and the hijackers certainly could make concurrent use of it. I don't think I'd attribute this to an algorithm weakness, though. If Google isn't seeing a page because Googlebot has gotten intercepted, it's not clear what the regular algorithm could do to sort this out.
It's also possible that what appear to be miscanonicalized listings simply could be random rather than coordinated, with replaced pages getting hit. A crawler spoofing Googlebot, btw, would be targeting a broad range of sites, so no one site would look targeted. Not all pages in the site would even get hijacking at once... they'd be subject to the vicissitudes of a natural crawl. On example site in our Feb thread got nibbled away over time.
Also, not all of the sites listed in these "directories" could be hijacked. Those sites that were verifying Googlebot and blocking the rogue IPs would be immune to this kind of replacement. The pages sites getting hijacked could look like the big sites with lots of PageRank, but that IMO would be a coincidental correlation... there's no reason the PageRank should have anything to do with this. Amazon is likely to be using rDNS.
As for motive, I was originally perplexed... as, from smb111's example I wondered why would a hijacker crawl a page and return it only as a 404. My guess is that they might be making multiple uses of the hijacked pages over time... but that the essence of the hijack was being able to replace a popular page in the serp with a desired page load at a given time.
It would be interesting to see what happens if the OP on this thread blocks the hijacking IPs by whitelisting Googlbot. If that fixes it, it would seem to me to be a clearcut diagnosis.
Anyway, these are top of my head thoughts. I'm not really an IT guy or a security expert, but the above is bringing together reports from several sources... the SER thread, this current thread, our Feb thread, and several of our old proxy hijacked threads referenced above.
Again I'd love to see some follow up on this by members who routinely deal with bots and hijacking.