| This 33 message thread spans 2 pages: < < 33 ( 1  ) || |
|Is it possible and advisable to block HTTP1.0 requests?|
Before trying to block generic requests, need advice as to unwanted effects
I am seeing more and more things in my raw access logs that I'm sure are not good for my site. One that I am seeing more of is
"GET / HTTP/1.0" followed by someone's URL. I have been researching here for days but possibly searching for the wrong terms. Here is the problem:
nnn.137.129.75 - - [18/Feb/2012:15:09:55 -0600] "GET / HTTP/1.0" 200 7638 "http://example.dir.ru/" "Mozilla/5.0 (Windows NT 5.1; U; en) Opera 8.00"
nnn.137.129.75 - - [18/Feb/2012:15:10:01 -0600] "GET / HTTP/1.0" 200 7638 "http://example.dir.ru/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
nnn.137.129.75 - - [18/Feb/2012:15:10:08 -0600] "GET / HTTP/1.0" 200 7638 "http://example.dir.ru/" "Mozilla/4.0 (compatible; MSIE 6.0; Update a; AOL 6.0; Windows 98)"
You can see that these 3 requests a few seconds apart are automated, the UAs are part of the script. There are other requsts that start out that way and add "somebrandname-HttpClient/3.1" after a blank UA.
From what I have been reading, the request just delivers the entire homepage, but I can't see a good reason to request it that way and it appears that it is done only to be able to spam my logs.
Is it a bad idea to block all requests for
"GET / HTTP/1.0" and
"GET / HTTP/1.1"? I mean, is there a downside? I apologize for asking a basic-newb question, but before I try to redirect this to a 403 I need to know if I should.
Among other things HTTP/1.1 add the
HOST header to the request so that a server could host multiple sites. HTTP/1.0 seems to be used mostly by bots but most bots do also send a host header to be sure they reach the right site once they hit your server.
Hm, interesting exercise. HTTP/1.0 runs about 5-10% of requests, but if you narrow it down to successes (status 200 or 304) it drops down to, I don't know, 3% with occasional huge spikes. Meaning that most of them are up to no good for other reasons.
I know that a couple of Chinese search engines, or at least named robots, live at 1.202-203 but I can't be bothered to keep track.
I find a few from India, assorted robots of varying quality, one State's department of education*, coupla random humans ... and, ahem, at least two from WebmasterWorld-- one of them via Proxify.
What I didn't find was what I'd be most concerned about shutting out: Users with elderly computers or slow IPs (the Native Population criterion).
Maybe a better question is: Can you look at HTTP/1.0 in conjunction with something else, like some detail of the UA or IP, and say that something here is definitely forged? That's who I'd want to add to the lockout list.
* I had to see what page they were after. Turned out to be quite hilarious, but in an "I guess you had to be there" way.
FWIW: HTTP/1.0 in the last eight hours = more legit than not.
example.exampleschools.org.uk [prob. add-on; FF 10 not avail in 2010...]
Mozilla/5.0 (Windows NT 5.1; rv:10.0.1) Gecko/20100101 Firefox/10.0.1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11
*and* [apparently copy-pasting...]
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MDDR; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; OfficeLiveConnector.1.4; OfficeLivePatch.1.3; BRI/2; MSOffice 12)
129.215.36.nn [favicon-related feature of...]
Safari/6534.51.22 CFNetwork/454.12.4 Darwin/10.8.0 (i386) (iMac10%2C1)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) NS8/0.9.6
Mozilla/4.0 (compatible; MSIE 6.0; Windows ME) Opera 7.11 [en]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
| This 33 message thread spans 2 pages: < < 33 ( 1  ) |