Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

this time it's Sogou

61.135 returns

11:32 pm on Dec 9, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
votes: 309

October 2010 [webmasterworld.com] (Soso)
August 2011 [webmasterworld.com] (Yodao)
July 2012 [webmasterworld.com] (Baidu)
September 2012 [webmasterworld.com] (thread next door in Apache)

The current incarnation looks like this (spacing as shown):* - - [23/Sep/2013:09:50:10 -0700] "GET /robots.txt HTTP/1.1" 200 1014 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" - - [23/Sep/2013:09:50:11 -0700] "GET /ebooks/paston/paston6b.html HTTP/1.1" 403 2963 "-" "New-Sogou-Spider/1.0 (compatible; MSIE 5.5; Windows 98)"

What is it with Chinese robots anyway? They always seem to put on UA strings that would get them blocked even from a previously unknown IP.

Personal hunch: the idea is to lull servers into complacency by first asking for robots.txt. It isn't very determined though; it goes away after one or two 403s. (If anyone has been asleep for the last five years, the uber-range is If only Ukrainian robots lived in such nice fat /10 blocks!)

Cursory log search tells me they also show up at** with the same behavior pattern except that they don't change UAs after getting robots.txt. I don't know if either one is legit; free lookup is uninformative on both.

* The referenced page is in Chinese except for the recurring phrases "sogou spider" and "robots.txt". Rumor has it they're compliant, but who gives a ###.
** Not to be confused with, which sometimes claims to be Baidu.