[webmasterworld.com] (Soso) August 2011
[webmasterworld.com] (Yodao) July 2012
[webmasterworld.com] (Baidu) September 2012
[webmasterworld.com] (thread next door in Apache)
The current incarnation looks like this (spacing as shown):*
184.108.40.206 - - [23/Sep/2013:09:50:10 -0700] "GET /robots.txt HTTP/1.1" 200 1014 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
220.127.116.11 - - [23/Sep/2013:09:50:11 -0700] "GET /ebooks/paston/paston6b.html HTTP/1.1" 403 2963 "-" "New-Sogou-Spider/1.0 (compatible; MSIE 5.5; Windows 98)"
What is it with Chinese robots anyway? They always seem to put on UA strings that would get them blocked even from a previously unknown IP.
Personal hunch: the idea is to lull servers into complacency by first asking for robots.txt. It isn't very determined though; it goes away after one or two 403s. (If anyone has been asleep for the last five years, the uber-range is 18.104.22.168/10. If only Ukrainian robots lived in such nice fat /10 blocks!)
Cursory log search tells me they also show up at 22.214.171.124** with the same behavior pattern except that they don't change UAs after getting robots.txt. I don't know if either one is legit; free lookup is uninformative on both.
* The referenced page is in Chinese except for the recurring phrases "sogou spider" and "robots.txt". Rumor has it they're compliant, but who gives a ###.
** Not to be confused with 126.96.36.199, which sometimes claims to be Baidu.