Forum Moderators: phranque

Message Too Old, No Replies

Block phpproxy because duplicate content .

Block phpproxy because duplicate content ...

         

leandre

1:36 am on Dec 10, 2008 (gmt 0)

10+ Year Member



Hello,

I have problem. My vistor use proxy.
This is not a problem. My problem google index proxy url ex :

http://example-proxy-domain.com/index.php?q=http%3A%2F%2Fwww.google.ca%2F

And this is duplicate content :(

Im use temporary solution

Deny from 67.205.111.212

But tomorow i have other problem, witch other php proxy.

In my apache log i view this :

example.com 67.205.111.212 [10/Dec/2008:02:35:13 +0100] "GET /test.html HTTP/1.0" 200 7001

It's possible to find a test for block phpproxy ?
And wikipedia block proxy for edit page, what's solution ?

Best regards

Sorry my english is so poor

[edited by: jdMorgan at 1:53 pm (utc) on Dec. 10, 2008]
[edit reason] Obscured & de-linked URL. [/edit]

jdMorgan

1:51 pm on Dec 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For this kind of problem, it is necessary to collect more information about the proxy requests. You can do this using SSI on your pages calling a small PERL script or by adding some PHP code to your pages. You want to create a file to log all common request headers and proxy-related headers from suspicious requests.

After collecting this information about suspicious requests, it is often possible to block proxy requests using combinations of HTTP request header values. Headers like "X-Forwarded-For" and "Via" are often the most useful. You can also look at the "Accept" headers and validate them against the user-agent string in the request; Proxies often modify or omit the "Accept", "Accept-Encoding", and "Accept-Language" request headers typically sent by browsers, and this can be detected.

However, as you can probably tell, this is not an easy project the first time, so it will take quite a bit of work to start. After you have done it once, it is a lot easier.

Jim

jdMorgan

1:55 pm on Dec 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A simpler solution is also available in this case: If you do a WHOIS on that IP address, you will find that it is assigned to a dedicated server at a major hosting company. With the exception of known search engine spiders, there is no reason to allow another server to request pages from your site. You can block all IP address ranges belonging to hosting companies if they cause you problems.

Jim