Proxy Hijack - Now what should I do? - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Proxy Hijack - Now what should I do?

?

followgreg

7:13 am on Apr 9, 2007 (gmt 0)

Guys,

I just found out that some of the pages on one of our sites was Hijacked by some proxy! :(

Our 2 years old blog homepage disapeared from Google index, and google cache from OUR site shows the proxy server URL!

The HTTP answer shows a x-pingback using xmlrpc.php on the blog (Wordpress) while the domain is the one form the spammer.

My question is, how do I fix it?

g1smd

3:57 pm on Apr 20, 2007 (gmt 0)

>> I simply did a WHOIS on the IP address to see if Google owned it and authorized googlebot or mediapartners-google if it crawled via the following IP ranges: <<

Google owns, and is now using, several other IP ranges that you have not listed.

Those ranges were listed, late in 2006, here at WebmasterWorld.

avalanche101

10:18 pm on Apr 25, 2007 (gmt 0)

Hi,
I've been doing a bit of reading about proxy (or should that be poxy) servers.
Am I correct in thinking the problem with them is when they cache a page to their server, whereby you end up with the following example URL:
www.proxy-server-url/nph-1.pl/000010A/http/your-website.com
?

So G thinks there are 2 copies of the same page on different websites?

kidder

11:16 pm on Apr 25, 2007 (gmt 0)

This all sounds like it is going to eat up a whole lot of time, it's hard enough dealing with SEO without having to out smart content theives. I think one of my sites lost it's "authority" status a while back due to a proxy hijack - What would happen if you tried this with Wikki or similar site? Are they immune? I would bet they are due to the trust factor. There must be a better way than trying to stay one step ahead of these filthy stealing dirtbags? At the end of the day it's a pure SE problem is it not? They should be able to give us a solution via their "webmaster tools" or is that asking to much? What about a unique original content tracking code or something along those lines?

queritor

11:30 pm on Apr 25, 2007 (gmt 0)

Hi,
...Am I correct in thinking the problem with them is when they cache a page to their server, whereby you end up with the following example URL:
www.proxy-server-url/nph-1.pl/000010A/http/your-website.com
?...

It can be less obvious than that. Just by using the built-in proxy support of Apache, you can configure a URL that will transparently display an entire site. For example I could set up htttp://mysite.com/webmasterworld/ and visitors couldn't tell. It's also pretty easy to take it one step further and place ads on the re-published site.

I just recently had someone proxy-ing my site. I contacted the ISP and it was shutdown within a few days. I guess I was lucky.

One way to minimize the damage is to use fully qualified URLs for internal links. That way, the first click will take the visitor to the original site. Unfortunately, if content theft is the true intention, then the thief could rewrite those links to remain internal to the proxy.

Drew

5:29 pm on Apr 27, 2007 (gmt 0)

Disallow: /*.asp*$
Disallow: /*.cgi*$
Disallow: /*.htm*$
Disallow: /*.html*$
and so on

I thought wild cards only worked in Google.

sandboxsam

7:08 pm on Apr 27, 2007 (gmt 0)

This problem is turning out to widespread. I know of about a dozen real estate related websites that are no longer in the G index and the proxy webpage
(www.proxy-server-url/nph-6.cgi/001000A /http/your-website.com) is in it's place.

It also looks like every page on my website (700 or so) have been hijacked too!

If you do a search
using:"www.proxy-server-url" "your-website.com".
You will see how many pages, on the proxy, are currently indexed by G.

g1smd

10:25 pm on Apr 27, 2007 (gmt 0)

Yes, all of your wildcard URLs need to be in the User-Agent: Googlebot section of your robots.txt file, along with a copy of all of the URLs that all bots are to not spider.

If there is a User-Agent: Googlebot section, then Google will not read the User-Agent: * section at all.

This 37 message thread spans 2 pages: 37