Forum Moderators: DixonJones
Anyone know what this is?
So maybe someone find your site via it, but then I would have expected a query string in the referer, unless they are doing some funky redirects.
So the other possibilities are
1) that it is their web crawler/spider and rather than putting its name in the User Agent they have put it in the referer. Does it hit your robots.txt file?
2) They are doing some log spam
How many times is it in your logs, and is it the same page each time?
Anyone know what this is?
Someone searching for an open proxy.
I don' have any open proxy on my machine.. i'm pretty sure of that.
The default Apache behavior when you're not using mod_proxy is (unfortunately) to ignore the host part of this proxy request and assume the URL requested was local (may we assume you have a resource called "/" that is 6360 bytes long?).
It is in there thousands of times...
That is somewhat of a concern. Proxy-seeking bots don't usually keep hammering you -- they check and then leave to search elsewhere. If I were you, I would first hand-examine a bunch of those to make sure none were actually processing proxy requests, then telnet in with my own proxy request to make darn sure I'm not running an open proxy.
Possibly the requests are going through a badly configured proxy server that is munging the GET's
Possible. IME, it tends to be people searching for an open proxy. The demand for them in China is high, I would imagine.
I wish Apache would simply return a 403 in this case instead of wasting my bandwidth and cluttering up my logs with clearly bogus fetches that it claims were successful. Hmmm, maybe I can use mod_proxy (ironically) to force that behavior to happen :-).
I just don't get why it happens so freaking often.
I would want to know that as well. Can you do a rewrite rule switching on %{HTTP_HOST} to start returning a failure code to these requests?
Something like: (WARNING: completely untested rule!)
RewriteCond %{HTTP_HOST}!(^www.yerdomain.com$) [NC]
RewriteRule .* - [F]
That would save a bit of bandwidth and possibly give any poorly-coded automated software a better hint that they're not accomplishing anything productive by sending these requests.