homepage Welcome to WebmasterWorld Guest from 54.225.57.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
this time it's Sogou
61.135 returns
lucy24




msg:4629114
 11:32 pm on Dec 9, 2013 (gmt 0)

October 2010 [webmasterworld.com] (Soso)
August 2011 [webmasterworld.com] (Yodao)
July 2012 [webmasterworld.com] (Baidu)
September 2012 [webmasterworld.com] (thread next door in Apache)

The current incarnation looks like this (spacing as shown):*

61.135.189.106 - - [23/Sep/2013:09:50:10 -0700] "GET /robots.txt HTTP/1.1" 200 1014 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
61.135.189.106 - - [23/Sep/2013:09:50:11 -0700] "GET /ebooks/paston/paston6b.html HTTP/1.1" 403 2963 "-" "New-Sogou-Spider/1.0 (compatible; MSIE 5.5; Windows 98)"

What is it with Chinese robots anyway? They always seem to put on UA strings that would get them blocked even from a previously unknown IP.

Personal hunch: the idea is to lull servers into complacency by first asking for robots.txt. It isn't very determined though; it goes away after one or two 403s. (If anyone has been asleep for the last five years, the uber-range is 61.128.0.0/10. If only Ukrainian robots lived in such nice fat /10 blocks!)

Cursory log search tells me they also show up at 220.181.125.155** with the same behavior pattern except that they don't change UAs after getting robots.txt. I don't know if either one is legit; free lookup is uninformative on both.


* The referenced page is in Chinese except for the recurring phrases "sogou spider" and "robots.txt". Rumor has it they're compliant, but who gives a ###.
** Not to be confused with 220.181.108.78, which sometimes claims to be Baidu.

 

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved