Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Proxy ranking for my pages and obsolete URL ranking

         

graeme_p

2:04 pm on Sep 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My main problem is that a proxy has knocked out some of my pages from the SERPS - and the index. The best of the pages I have found missing has been removed from the index, and the proxy version has taken exactly the same place in the SERPS.

In some cases both the original and proxy versions are present.

I can find no links to the proxy site, and the "copied" pages have no toolbar page rank, some of my pages that have gone from the index are well linked to.

I have blocked the proxy's IP (which seems to be working), and sent Google a spam report (from WMT).

What more can I do.

The url for my best keyword has also been replaced by an obsolete url with a query string that used to switch the page style. Instead of example.com/page, Google has indexed example.com/page?querystring. There is only one link to this from an old blog post, whereas the correct version has lots of external and internal links. I do not know if this is connected with the other problem but it did happen at the same time.

I think the proxy is malicious, but that I am not the victim. Their front page redirects to proxy another site, which I think is the main target, and I think my pages have been indexed to add relevant content to the site (as seen by Google).

Receptional Andy

3:04 pm on Sep 2, 2008 (gmt 0)



With the query string problem, you should 301 redirect example.com/page?querystring to example.com/page. It's perfectly possible for Google to pick the 'wrong' page from a batch of dupes, and the performance of the page will suffer if it's an unlinked version that gets picked. You can (and should) force a choice via redirects on any unintended duplicates.

The general implication the proxy problem is that your site is not stable enough to withstand fairly common issues - most likely due to a lack of links from the right sites.

graeme_p

7:17 pm on Sep 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why should a three year old site that has never suffered this sort of problem before suddenly be vulnerable to this? Lots of people have copied my content, but none has out ranked me with it before.

The site has fairly good backlinks. Some of the affected pages are first or second in the SERPS - Not only are the proxied pages beating my originals, they are beating Wikipedia and my long established competitors (sites in the top few thousand by traffic, if you believe Alexa). The pages usually go in the SERPS exactly where mine did (my ranking have been fairly stable recently).

Incidentally, the page with the query string on the url is first in the SERPS for its keyword. I have added a 301 anyway.

Receptional Andy

7:42 pm on Sep 2, 2008 (gmt 0)



Why should a three year old site that has never suffered this sort of problem before suddenly be vulnerable to this?

Remember that for your site's ranking to change (or for others to improve relative to yours) it only takes one of the following to occur:

- Your site/page changes or the external references to it changes
- A competitor's site/page changes or the references change
- Google changes

In most cases, all three are happening all the time.

From your description, it sounds like the proxied site/pages may have attracted greater numbers of external links, your site has lost links and/or Google has re-evaluated some of the external links to one or both sites, It might be that the site was precarious anyway, but there was little significant change up until now.

Incidentally, in case you haven't seen it, there is a Hot Topic [webmasterworld.com] thread on the subject: Proxy Server Hijacks - and how to defend [webmasterworld.com].

graeme_p

8:02 am on Sep 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have read the Hot Topic thread (again!). Some suggestions do not work in this case (e.g. <base href> and absolute internal links). One useful suggestion was to return a page to the proxy rather than denying it as Google takes time to de-index pages returning 403.

As far as I can see the proxy site has hardly any incoming links, and none to its copy of my site. It seems that Google is ranking it simply on the volume of material there: over 20,000 text heavy pages indexed, many from well know sites like ivillage (although Google ought to see that those, at least, are duplicates). On the other hand, my links look fine from Google Webmaster Tools, but fewer than usual are shown by a link: search.

My real question is whether there is anything I can do to reverse the damage faster, or whether I can now just wait for Google to re-index the proxy.

tedster

8:38 am on Sep 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The key to stopping a proxy server from hijacking is to stop googlebot from spidering your content through THEIR urls. Make sure that all googlebot requests that your server answers are coming in from a Google IP address.

This approach is mentioned in the other thread quite a few times, but I'll quote it again here:

jdMorgan:
Given an understanding of what a proxy *is* and how it works, the only step really needed is to verify that user-agents claiming to be Googlebot are in fact coming from Google IP addresses, and to deny access to requests that fail this test.

If the purported-Googlebot requests are not coming from Google IP addresses, then one of two things is likely happening:

1) It is a spoofed user-agent, and not really Googlebot.
2) It *is* Googlebot, but it is crawling your site through a proxy.

The latter is how sites get 'proxy hijacked' in the Google SERPs -- Googlebot will see your content on the proxy's domain.

Here's a reference on the "double reverse DNS" method for verifying googlebot:

How to verify Googlebot is Googlebot [webmasterworld.com]

Robert Charlton

8:27 pm on Sep 3, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



As I imperfectly understand it, and I'm not a server expert, the double-reverse DNS method for verifying Googlebot isn't really suitable for those with sites on shared servers... and that's probably most of us.

Again, from my imperfect understanding... a shared server can be set up to do a double-reverse verification for all requests. This runs the risk, though, of greatly slowing down the overall performance of your site.

To limit double-reverse DNS checking only to requests claiming to be Googlebot, etc, I've been told you need to use something beyond .htaccess-level code, and that's not available on shared virtual servers.

I think this needs to be said, as I spent many hours trying to get this sorted out, only to discover it couldn't be done.

Ruben

6:47 pm on Sep 14, 2008 (gmt 0)

10+ Year Member



I have the same problem. I blocked the proxy with my htaccess and I hope that the problem is solved in a short time.

My problem is, is that the proxy isn't working anymore, it doesn't show my pages anymore. I think that it was an attack to remove my pages from the index and then stop it. The proxy listed abut 80.000 domains.

I did also a Spam Report..