Welcome to WebmasterWorld Guest from 35.172.195.49

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Proxy Server URLs Can Hijack Your Google Ranking - how to defend?

     
1:59 pm on Jun 25, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:July 8, 2003
posts: 431
votes: 0


I posted about this in the back room but I think this need to be brought into public view. This is happening right now and could happen to you!

Over the weekend my index page and now some internal pages were proxy hijacked [webmasterworld.com] within Google's results. My well ranked index page dropped from the results and has no title, description or cache. A search for "My Company Name" brings up (now two) listings of the malicious proxy at the top of the results.

The URL of the proxy is formatted as such:
[scumbagproxy.com...]

A quick search in Google for "cgi-bin/nph-ssl.cgi/000100A/" brings up now 55,000+ results when Saturday it was 13,000 and Sunday it was 30,000. The number of sites affected are increasing exponentially and your site could be next.

Take preventative action now by doing the following...

1. Add this to all of your headers:

<base href="http://www.yoursite.com/" />

and if you see an attempted hijack...

2. Block the site via .htaccess:

RewriteCond %{HTTP_REFERER} yourproblemproxy\.com

3. Block the IP address of the proxy

order allow,deny
deny from 11.22.33.44
allow from all

4. Do your research and file a spam report with Google.
[google.com...]

6:38 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


Patrick are you using it in an SSI include call or is your system based on php?
6:45 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2003
posts:932
votes: 0


theBear... it's just a PHP script placed at the top of a file (HTML page), before <html>. I've only used it on a test page. When I view the page as Googlebot (with the Firefox User Agent Switcher) I'm booted out - because (obviously) my host name isn't Google. Otherwise I get a 200 OK.

[edited by: Patrick_Taylor at 6:48 pm (utc) on July 1, 2007]

7:11 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


I posted a simple Apache .htaccess implementation here [webmasterworld.com]. It relies on a built-in mod_access feature and does not require RewriteMap or additional scripting to achieve validation for the major search robots which support rDNS validation.

Jim

7:21 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 1, 2004
posts:1258
votes: 0


Thank you Jim. I'm going to give it a try right now.
8:25 pm on July 1, 2007 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


Jim, the only problem with doing it all in .htaccess is most hosts use control panels (Plesk/cPanel/etc.) which set Apache to "HostnameLookups Off" by default. Most hosts won't do a server wide change for a shared customer, especially when it's a) likely to get overwritten the next time they install an update for the control panel and b) likely to slow the shared server down.

If you save that PHP script previously posted in a file "reversedns.php" you can run it automatically for all pages in your website by adding the following to your .htaccess file:

AddType application/x-httpd-php .html .htm .txt
php_value auto_prepend_file "/my_full_account_path/httpdocs/reversedns.php"
8:40 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I really like the idea that you can specify a file that is in effect automatically included on the front of every file on your site without actually having to edit every file on the site to include it. I have used that a few times in the past and it works well.
9:33 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 4, 2000
posts: 1324
votes: 0


There are many many PPC companies with thousands of Clients on there Reverse Proxy systems, which no doubt are earning Google Millions of Dollars.

It's going to be hard for them to block only the bad guys. One Reverse Proxy company we know of may actually be the largest PPC company in the world. Got to think Google will have to think long and hard about blocking them.

They take the pages, change all the phone numbers to Whose Calling tracking numbers, they take credit for all the form submissions and of course drive traffic to the sites via PPC.

-s-

9:47 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2003
posts:932
votes: 0


I really like the idea that you can specify a file that is in effect automatically included on the front of every file on your site without actually having to edit every file on the site to include it.

Me too.

php_value auto_prepend_file "/my_full_account_path/httpdocs/reversedns.php"

theBear, thanks. That's neat.

As an alternative, I've got into the habit of php-including a something.php file in front of the HTML on all pages. It can be an empty file, then if a script needs to be prepended to all pages, it's easily done in the one file.

10:07 pm on July 1, 2007 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


theBear, thanks. That's neat.

Um, that was from theBILL, not theBear :)

10:21 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> problem with doing it all in .htaccess is most hosts use control panels (Plesk/cPanel/etc.) which set Apache to "HostnameLookups Off" by default.

True, and I think we've run into that very problem over in the the implementation-specific thread in Apache.

However, as I've stated before, the goal is not to pick apart this or that method, giving up completely if a 'perfect' solution can't be found, but rather to present as many workable solutions as possible, so that members with different mixes of server capabilities and varying familiarity/comfort levels with different approaches have more than one option to solve the problem.

If rDNS doesn't work, then IP-based blocking can still work if the logs are watched closely for new valid robots IP ranges popping up. PHP will work well, except on old static sites where the Webmaster is more concerned with maintaining the content than with learning a whole new language and changing the whole site to make it script-based so that it can be protected against the subject exploit.

And some sites won't tolerate any but the most specific rDNS solution because the server is already overloaded, and they can't afford to do more that a very few 'bot-specific lookups on a very few specific URLs until the server is upgraded.

Everybody's site and priorities are different, and there's no 'perfect' answer; All we can do is offer as many options as possible.

Jim

10:32 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2003
posts:932
votes: 0


theincrediBILL, not theBear

Yes, my apologies!

(Excuse? It's a Sunday.)

11:08 pm on July 1, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 28, 2002
posts:3483
votes: 27


oh no this REALLY reminds when google also mest up the redirecting 301 302 and meta stuff and that also hit A LOT of sites, it took years for them to come back into the index, now we are on it again the PROXY, they really have to ackt now, but I remember with the googlebug302 it took almost 2 years be fore they began to do something.
12:10 am on July 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1997
votes: 75


SO WHAT DOES GOOGLE HAVE TO SAY ABOUT THIS?
12:12 am on July 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


"Everybody's site and priorities are different, and there's no 'perfect' answer; All we can do is offer as many options as possible."

Jim, that is very true, just keep sight of what you are blocking, don't get so sure of things that you forget there is always another way to do something and that something might not be in your best interest.

12:36 am on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 19, 2003
posts:46
votes: 0


Google says that it is unlikely to happen, but they have a strong interest in "ordinary" queries where proxies outrank the originals.

My guess is that "we" can jump all to easy to conclusions. There may be other factors in play, when a site or page is tanked, and proxied content seemingly takes precedence.

We need to provide samples, proof.

12:57 am on July 2, 2007 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


the goal is not to pick apart this or that method, giving up completely if a 'perfect' solution can't be found, but rather to present as many workable solutions as possible

I agree.

However, I thought it would be best to quickly explain why that version didn't work for all the people using shared hosting before they tore their hair out trying to figure it out why it didn't work for them.

1:06 am on July 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Indexing of proxy URLs by Google is nothing new. I had to take steps to fix a problem with indexed proxy URLs nearly four years ago. Luckily both of the proxy owners fixed things at their end to stop Google crawling their system.
1:09 am on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 19, 2003
posts:46
votes: 0


What I don't understand is that Google doesn't seem to mind about those proxy sites. I guess they consider them harmless.

They are so easy to recognize, and just polluting, intentional or not.

They could act on spam reports concerning proxies, and just delete and or ban them. Most of those sites are actually one page only anyway.

This would stop those sites completely after a while, and we all can concentrate on better things.

1:10 am on July 2, 2007 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


We need to provide samples, proof.

I have a few of them, not so hard to find.

Actually, a couple are completely bizarre in that the site is indexed in Google but doesn't even show up when you search for it's home page title, the proxy server does!

Both the proxy server and the hijacked site are PR3 and a "site:domain.com" search shows that the site in question is indexed but Google won't show the home page for the query, only the proxy server.

1:44 am on July 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 4, 2000
posts: 1324
votes: 0


Remember, There are thousands if not hundreds of thousands of sites that use PPC resellers who reverse proxy their sites. That way they are able to take credit for the PPC traffic and activities on the sites without building content or landing pages. If they show up organically, that's just gravy for the PPC company.

If Google stops reverse proxy, they cut off a large flow of revenue.

-s-

1:49 am on July 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


Gede,

Google has been provided tons of spam reports dealing with things of this nature.

We even had a promise from someone to explain the Google washing of the Google Adsense page (actually I think it was fried in bacon grease but that is just me). I haven't seen that explained yet has anyone else? I'll admit I haven't been paying that much attention to the wacky world of the search engines lately so I may have missed it. If I've missed it, would someone kindly provide a link to said explanation?

9:13 am on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 10, 2006
posts:86
votes: 0


I don't think its the proxy owner problem. Its googlebot that crawl the other site using this proxy!

bext wway would be to contact the site owner and tell him to block access to your site from his proxies that way google won't be able to access your site using his proxies andthe indexed pages will disappear over aperiod of time.

9:36 am on July 2, 2007 (gmt 0)

Preferred Member from CH 

10+ Year Member

joined:Mar 10, 2004
posts:429
votes: 0


Thanks TheBear, JD, and TheBill and other contributors for the excellent tips.
9:37 am on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 19, 2003
posts:46
votes: 0


testing0,

Of course it is the site owners problem, they push content that is not owned by them into a 3rd parties database, in this case Google, as if it were theirs.

Its deception.

9:45 am on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 10, 2006
posts:86
votes: 0


Gede

do you have any understanding how a online proxy server works?!

they create online proxies for anonymous browisng, bypassing filters and safe browsing.

its just a bug that google crawl other web sites using a web based proxy form.

[edited by: testing0 at 9:58 am (utc) on July 2, 2007]

9:56 am on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 19, 2003
posts:46
votes: 0


I have sort of an idea, I downloaded the php version, out of curiosity.

As far as I can see, the software allows for "hotlinking" or not, and setting it to on, allows googlebot to crawl proxied pages, under certain conditions. I guess people can even submit a proxy session to Google.

Are you going to tell me that proxy operators are innocent people victim of a vicious google bot?

I have no objections to proxies, but as soon as they give opportunity to store my content into the Google cache as if it were theirs, they are doing something wrong, don't you think?

10:24 am on July 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member hobbs is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 19, 2004
posts:3056
votes: 5


I am still hoping for the dumbed down version of the solution and a tested and working code for the reversedns.php (dedicated host)

jdMorgan's code here
[webmasterworld.com...]

<FilesMatch "\.(s?html?¦php[45]?)$">
SetEnvIfNoCase User-Agent "!(Googlebot¦msnbot¦Teoma)" notRDNSbot
#
Order Deny,Allow
Deny from all
Allow from env=notRDNSbot
Allow from googlebot.com
Allow from search.live.com
Allow from ask.com
#
</FilesMatch>

sounds great, except it does not tell me how to integrate it with my existing .htaccess:

SetEnvIfNoCase User-Agent "somebot" bad_bot
SetEnvIfNoCase User-Agent "^bot2" bad_bot
SetEnvIfNoCase User-Agent ^$ bad_bot
SetEnvIfNoCase User-Agent ^-$ bad_bot

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

<Files 403.shtml>
order allow,deny
allow from all
</Files>

deny from www.#*$!.yyy.zzz

10:59 am on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 10, 2006
posts:86
votes: 0


here is a fix for proxy owners.
add this to your robots.txt.


User-agent: *
Disallow: /index.php/

11:32 am on July 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 16, 2004
posts:854
votes: 0


We had a site jacked via this method 19 months ago, nice pr6 site that was 8 years old. By the time we figured out what was going on it was to late. Site is slowly coming back now but in no way is it even close to what it used to be.

Through research we found the 2 proxies that hijacked us are owned by the same SEO firm. Doesnt take long to figure out what happened in our case, the SEO firm was hired to attain rank and couldnt outrank us so they decided to take us out of the equation:(

Pretty sad with so many brains over at the plex that they cannot figure out a counter to this exploit yet.

3:31 pm on July 2, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:July 19, 2004
posts:142
votes: 0


My "second" to Hobbs:
I am still hoping for the dumbed down version of the solution and a tested and working code for the reversedns.php (dedicated host)

I wonder how many people are reading this thread and bouncing back and forth with the apache forum links and other links trying to get a working solution. Hopefully, someone out there can top (improve upon) what has already been presented?

Thanks in advance to the contributor who will share their "best" solution. Or, maybe what we have is the complete kit-n-caboodle (as CC says, it's preferable to have the kit with the caboodle).

This 174 message thread spans 6 pages: 174