homepage Welcome to WebmasterWorld Guest from 54.237.184.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
jikespider
Chinese bot shape-shifts on the fly.
Pfui




msg:4353122
 3:22 pm on Aug 18, 2011 (gmt 0)

Yikespider is more like it...

I certainly don't like the looks of this one from 1.202.221.10 a.k.a. ChinaNet Beijing. Note the different UAs literally from one second to the next, one version for HEAD and one for GET. And note the unbalanced " in the GET:

3.221.202.1.static.bjtelecom.net - - [1n/Aug/2011:0n:20:31 -0700] "HEAD /dir/file.html HTTP/1.0" 302 0 "-" "jikespider ("Mozilla/5.0)"
3.221.202.1.static.bjtelecom.net - - [1n/Aug/2011:0n:20:32 -0700] "GET /dir/file.html HTTP/1.0" 302 215 "-" "jikespider "Mozilla/5.0"

robots.txt? NO

Found yet another version elsewhere:

JikeSpider Mozilla/5.0 (compatible; JikeSpider; +http://shoulu.jike.com/spider.html)

Here's some circa June 24, 2011 news about this creepy-crawler [cnngo.com...] --

"Earlier this week, China's state-run company People’s Search announced the re-launch of its search engine Jike.com.

"Formerly named Goso.cn, the search engine was first launched by People’s Search, a joint venture between People’s Daily and People.com, in May 2010. ..."

Considering I have firewall killfile rules against umpteen Chinese CIDRs and .hta blocks against a gazillion others, I'm amazed at the relentless, troublesome, and now state-sanctioned traffic from that part of the world. And I reckon it's only going to get worse...

 

lucy24




msg:4354358
 10:46 pm on Aug 22, 2011 (gmt 0)

Oh, gosospider. I remember them.

2.219.202.1.static.bjtelecom.net - - [...] "HEAD / HTTP/1.0" 403 271 "-" "jikespider (\"Mozilla/5.0)"
2.219.202.1.static.bjtelecom.net - - [...] "GET / HTTP/1.0" 403 2266 "-" "jikespider \"Mozilla/5.0"


Right down to that extra quotation mark (escaped).

The unnerving thing is, I can't for the life of me figure out why they landed a 403. Not that I'm complaining, mind, but I've pored over my htaccess and can only conclude that the server is psychic.

(pfui, we don't have the same host do we? Mine suddenly went haywire on IP addresses too.)

Edit:
While mopping up, I found these guys, who must be their cousins. Yawn.

1.202.219.2 - - [...] "GET /robots.txt HTTP/1.0" 200 769 "-" "Mozilla/5.0 ()"
Pfui




msg:4354393
 1:57 am on Aug 23, 2011 (gmt 0)

(Lucy: Doubt it, unless yours is in downtown Seattle?)

All: I found this after I wrote the OP. It came in at the same time from the same IP (and was also seen by Lucy):

3.221.202.1.static.bjtelecom.net
Mozilla/5.0 ()

FWIW, that UA did ask for robots.txt, w/o success: When a UA is that absurd, the visitor gets a one-way ticket to 127.0.0.1

(Lucy: That may also be where your 403 came from. Servers can kick uneven parens automatically.)

For those of you keeping score at home, jikespider used THREE distinct, and distinctly screwy UAs in mere seconds:

jikespider ("Mozilla/5.0)
jikespider "Mozilla/5.0
Mozilla/5.0 ()


(Who programmed this thing, a spam harvester?;)

Pfui




msg:4354637
 7:29 pm on Aug 23, 2011 (gmt 0)

My little friend, back again --

3.221.202.1.static.bjtelecom.net

-- still not ID'ing itself in the first hit, still using the same trio o' mangled UAs. Also still HEAD'ing yet another page it's supposedly never seen:

GET /robots.txt
Mozilla/5.0 ()

HEAD /dir/filename.html
jikespider ("Mozilla/5.0)

GET /dir/filename.html
jikespider "Mozilla/5.0

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved