Forum Moderators: Robert Charlton & goodroi
<base href="http://www.yoursite.com/" />
RewriteCond %{HTTP_REFERER} yourproblemproxy\.com
order allow,deny
deny from 11.22.33.44
allow from all
[edited by: Patrick_Taylor at 6:48 pm (utc) on July 1, 2007]
Jim
If you save that PHP script previously posted in a file "reversedns.php" you can run it automatically for all pages in your website by adding the following to your .htaccess file:
AddType application/x-httpd-php .html .htm .txt
php_value auto_prepend_file "/my_full_account_path/httpdocs/reversedns.php"
It's going to be hard for them to block only the bad guys. One Reverse Proxy company we know of may actually be the largest PPC company in the world. Got to think Google will have to think long and hard about blocking them.
They take the pages, change all the phone numbers to Whose Calling tracking numbers, they take credit for all the form submissions and of course drive traffic to the sites via PPC.
-s-
I really like the idea that you can specify a file that is in effect automatically included on the front of every file on your site without actually having to edit every file on the site to include it.
Me too.
php_value auto_prepend_file "/my_full_account_path/httpdocs/reversedns.php"
theBear, thanks. That's neat.
As an alternative, I've got into the habit of php-including a something.php file in front of the HTML on all pages. It can be an empty file, then if a script needs to be prepended to all pages, it's easily done in the one file.
True, and I think we've run into that very problem over in the the implementation-specific thread in Apache.
However, as I've stated before, the goal is not to pick apart this or that method, giving up completely if a 'perfect' solution can't be found, but rather to present as many workable solutions as possible, so that members with different mixes of server capabilities and varying familiarity/comfort levels with different approaches have more than one option to solve the problem.
If rDNS doesn't work, then IP-based blocking can still work if the logs are watched closely for new valid robots IP ranges popping up. PHP will work well, except on old static sites where the Webmaster is more concerned with maintaining the content than with learning a whole new language and changing the whole site to make it script-based so that it can be protected against the subject exploit.
And some sites won't tolerate any but the most specific rDNS solution because the server is already overloaded, and they can't afford to do more that a very few 'bot-specific lookups on a very few specific URLs until the server is upgraded.
Everybody's site and priorities are different, and there's no 'perfect' answer; All we can do is offer as many options as possible.
Jim
Jim, that is very true, just keep sight of what you are blocking, don't get so sure of things that you forget there is always another way to do something and that something might not be in your best interest.
My guess is that "we" can jump all to easy to conclusions. There may be other factors in play, when a site or page is tanked, and proxied content seemingly takes precedence.
We need to provide samples, proof.
the goal is not to pick apart this or that method, giving up completely if a 'perfect' solution can't be found, but rather to present as many workable solutions as possible
I agree.
However, I thought it would be best to quickly explain why that version didn't work for all the people using shared hosting before they tore their hair out trying to figure it out why it didn't work for them.
They are so easy to recognize, and just polluting, intentional or not.
They could act on spam reports concerning proxies, and just delete and or ban them. Most of those sites are actually one page only anyway.
This would stop those sites completely after a while, and we all can concentrate on better things.
We need to provide samples, proof.
I have a few of them, not so hard to find.
Actually, a couple are completely bizarre in that the site is indexed in Google but doesn't even show up when you search for it's home page title, the proxy server does!
Both the proxy server and the hijacked site are PR3 and a "site:domain.com" search shows that the site in question is indexed but Google won't show the home page for the query, only the proxy server.
If Google stops reverse proxy, they cut off a large flow of revenue.
-s-
Google has been provided tons of spam reports dealing with things of this nature.
We even had a promise from someone to explain the Google washing of the Google Adsense page (actually I think it was fried in bacon grease but that is just me). I haven't seen that explained yet has anyone else? I'll admit I haven't been paying that much attention to the wacky world of the search engines lately so I may have missed it. If I've missed it, would someone kindly provide a link to said explanation?
bext wway would be to contact the site owner and tell him to block access to your site from his proxies that way google won't be able to access your site using his proxies andthe indexed pages will disappear over aperiod of time.
As far as I can see, the software allows for "hotlinking" or not, and setting it to on, allows googlebot to crawl proxied pages, under certain conditions. I guess people can even submit a proxy session to Google.
Are you going to tell me that proxy operators are innocent people victim of a vicious google bot?
I have no objections to proxies, but as soon as they give opportunity to store my content into the Google cache as if it were theirs, they are doing something wrong, don't you think?
jdMorgan's code here
[webmasterworld.com...]
<FilesMatch "\.(s?html?¦php[45]?)$">
SetEnvIfNoCase User-Agent "!(Googlebot¦msnbot¦Teoma)" notRDNSbot
#
Order Deny,Allow
Deny from all
Allow from env=notRDNSbot
Allow from googlebot.com
Allow from search.live.com
Allow from ask.com
#
</FilesMatch>
sounds great, except it does not tell me how to integrate it with my existing .htaccess:
SetEnvIfNoCase User-Agent "somebot" bad_bot
SetEnvIfNoCase User-Agent "^bot2" bad_bot
SetEnvIfNoCase User-Agent ^$ bad_bot
SetEnvIfNoCase User-Agent ^-$ bad_bot<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit><Files 403.shtml>
order allow,deny
allow from all
</Files>deny from www.#*$!.yyy.zzz
Through research we found the 2 proxies that hijacked us are owned by the same SEO firm. Doesnt take long to figure out what happened in our case, the SEO firm was hired to attain rank and couldnt outrank us so they decided to take us out of the equation:(
Pretty sad with so many brains over at the plex that they cannot figure out a counter to this exploit yet.
I am still hoping for the dumbed down version of the solution and a tested and working code for the reversedns.php (dedicated host)
I wonder how many people are reading this thread and bouncing back and forth with the apache forum links and other links trying to get a working solution. Hopefully, someone out there can top (improve upon) what has already been presented?
Thanks in advance to the contributor who will share their "best" solution. Or, maybe what we have is the complete kit-n-caboodle (as CC says, it's preferable to have the kit with the caboodle).