Forum Moderators: DixonJones

Message Too Old, No Replies

Weird activity in logs

         

superdorf

5:09 am on Feb 14, 2006 (gmt 0)

10+ Year Member



I am getting a lot of hits like this:
"GET [61.152.160.106:8000...] HTTP/1.1" 200 6360 "http://www.baidu.com" "mozilla/4.0 (compatible; MSIE 6.0;
Windows 5.1;Windows 5.5;Windows 6.0)"
in my logs...

Anyone know what this is?

Dijkgraaf

9:21 am on Feb 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well Baidu is a Chinese search engine.

So maybe someone find your site via it, but then I would have expected a query string in the referer, unless they are doing some funky redirects.
So the other possibilities are
1) that it is their web crawler/spider and rather than putting its name in the User Agent they have put it in the referer. Does it hit your robots.txt file?
2) They are doing some log spam
How many times is it in your logs, and is it the same page each time?

superdorf

6:48 pm on Feb 14, 2006 (gmt 0)

10+ Year Member



It is in there thousands of times... That ip address is not my ip address though....

I haven't checked if it hits the robots file... I will later.

-Jamie

Dijkgraaf

9:18 pm on Feb 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The fact that it isn't your IP address makes it interesting. Because whatever is requesting pages from your site must know what your servers IP address or domain is.
The IP address belongs to online.sh.cn.
Possibly the requests are going through a badly configured proxy server that is munging the GET's
It would be intersting if you could trap all the headers of the request to see what it is actualy asking for.

superdorf

4:21 am on Feb 15, 2006 (gmt 0)

10+ Year Member



How do i capture that... I use apache. I don' have any open proxy on my machine.. i'm pretty sure of that.

ronburk

1:16 am on Feb 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone know what this is?

Someone searching for an open proxy.

I don' have any open proxy on my machine.. i'm pretty sure of that.

The default Apache behavior when you're not using mod_proxy is (unfortunately) to ignore the host part of this proxy request and assume the URL requested was local (may we assume you have a resource called "/" that is 6360 bytes long?).

It is in there thousands of times...

That is somewhat of a concern. Proxy-seeking bots don't usually keep hammering you -- they check and then leave to search elsewhere. If I were you, I would first hand-examine a bunch of those to make sure none were actually processing proxy requests, then telnet in with my own proxy request to make darn sure I'm not running an open proxy.

Possibly the requests are going through a badly configured proxy server that is munging the GET's

Possible. IME, it tends to be people searching for an open proxy. The demand for them in China is high, I would imagine.

I wish Apache would simply return a 403 in this case instead of wasting my bandwidth and cluttering up my logs with clearly bogus fetches that it claims were successful. Hmmm, maybe I can use mod_proxy (ironically) to force that behavior to happen :-).

superdorf

6:01 pm on Feb 16, 2006 (gmt 0)

10+ Year Member



I tested it with telnet by:
telnet localhost 80
then i issued the same GET command posted above...
my server returned my index page...

I think that confirms what you were saying... My server is giving my homepage... I just don't get why it happens so freaking often.

Thanks,
-Jamie

ronburk

7:42 pm on Feb 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just don't get why it happens so freaking often.

I would want to know that as well. Can you do a rewrite rule switching on %{HTTP_HOST} to start returning a failure code to these requests?

Something like: (WARNING: completely untested rule!)

RewriteCond %{HTTP_HOST}!(^www.yerdomain.com$) [NC]
RewriteRule .* - [F]

That would save a bit of bandwidth and possibly give any poorly-coded automated software a better hint that they're not accomplishing anything productive by sending these requests.

ronburk

10:48 pm on Feb 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm, looks like the preview was not faithful and posting ate some whitespace. Should be some whitespace in front of that "!".