Welcome to WebmasterWorld Guest from 54.167.85.221

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

this time it's Sogou

61.135 returns

     

lucy24

11:32 pm on Dec 9, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



October 2010 [webmasterworld.com] (Soso)
August 2011 [webmasterworld.com] (Yodao)
July 2012 [webmasterworld.com] (Baidu)
September 2012 [webmasterworld.com] (thread next door in Apache)

The current incarnation looks like this (spacing as shown):*

61.135.189.106 - - [23/Sep/2013:09:50:10 -0700] "GET /robots.txt HTTP/1.1" 200 1014 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 
61.135.189.106 - - [23/Sep/2013:09:50:11 -0700] "GET /ebooks/paston/paston6b.html HTTP/1.1" 403 2963 "-" "New-Sogou-Spider/1.0 (compatible; MSIE 5.5; Windows 98)"

What is it with Chinese robots anyway? They always seem to put on UA strings that would get them blocked even from a previously unknown IP.

Personal hunch: the idea is to lull servers into complacency by first asking for robots.txt. It isn't very determined though; it goes away after one or two 403s. (If anyone has been asleep for the last five years, the uber-range is 61.128.0.0/10. If only Ukrainian robots lived in such nice fat /10 blocks!)

Cursory log search tells me they also show up at 220.181.125.155** with the same behavior pattern except that they don't change UAs after getting robots.txt. I don't know if either one is legit; free lookup is uninformative on both.


* The referenced page is in Chinese except for the recurring phrases "sogou spider" and "robots.txt". Rumor has it they're compliant, but who gives a ###.
** Not to be confused with 220.181.108.78, which sometimes claims to be Baidu.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month