I started writing a detailed response to this thread back in April. Unfortunately on one of the Virtual Parallels Workstation images that got corrupted the next day :( ,,... anyway....
Most of the Open source scraping packages use http:/1.0. Most of the spam bots do. Some PROXY Servers from large Corporations 'still' do as well :(.
In my book: Unless the request is sent with the 'full head of headers' it is 99% block-able. On its own it is just an indicator, strong one, that something is not to a par.
But then again:
ip: 157.55.32.111
remote host: msnbot-157-55-32-111.search.msn.com (0)
method: GET
protocol:HTTP/1.0
User-Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Connection: Keep-Alive
From: bingbot(at)microsoft.com
URI: /robots.txt
Accept: */*
Cache-Control: no-cache
-----------------------------------------------------------------------------------
ip: 83.149.126.98
remote host: 83.149.126.98 (-4)
method: GET
protocol: HTTP/1.0
User-Agent: Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http:// www.majestic12.co.uk/bot.php?+) (UA altered on purpose)
Connection: close
URI: /robots.txt
Accept: */*
Accept-Language: en
-----------------------------------------------------------------------------------
Live visitor via Squid Proxy
ip: 61.90.11.XXX
remote host: ppp-61-90-11-XXX.revip.asianet.co.th (0)
method: GET
protocol: HTTP/1.0
Accept-Encoding: gzip,deflate,sdch
User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17
Via: 1.0 PROXY, 1.0 efw-60.greyhound.co.th:8080 (squid/2.6.STABLE22)
Connection: Keep-Alive
Accept-Charset: windows-874,utf-8;q=0.7,*;q=0.3
Referer: http:// www.google.co.th/imgres?.... blah bla blah
URI: /some.html
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
X-Chrome-Variations: .... blah bla blah==
Cache-Control: max-age=259200
Accept-Language: th-TH,th;q=0.8
-----------------------------
Comment spammer from OVH
ip: 46.105.122.108
remote host: ns384327.ovh.net (0)
time: {ts '2013-05-17 21:34:17'}
method: GET
protocol: HTTP/1.0
host: forum.example.com <----- this site never had a forum
user-agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4
URI: /
referer: root of the site
accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
cookie: CFID=5111132; CFTOKEN=830f5b80
Notice that this one also sent cookie information, unfortunately for them, that cookie was not for them :)