Page is a not externally linkable
Pfui - 9:04 am on Jun 12, 2006 (gmt 0)
But at first, you're right, Jim, it didn't ask for robots.txt by itself, but always within seconds of regular Slurp asking for same. That was when I first saw Slurp China, back in November, 2005: lj9118.inktomisearch.com - - [17/Nov/2005:02:23:03 -0800] "GET /robots.txt HTTP/1.0" 302 213 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] Within a few weeks, it started asking for just robots.txt, using its own ID: lj9119.inktomisearch.com - - [01/Dec/2005:08:41:09 -0800] "GET /robots.txt HTTP/1.0" 200 3990 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...] And thereafter, robots.txt plus a single (and robots.txt-Disallowed) file, akin to your excerpt: lj9119.inktomisearch.com - - [14/Feb/2006:23:42:30 -0800] "GET /robots.txt HTTP/1.0" 200 6401 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...] I tried just letting it have robots.txt and Forbidding (not just Disallowing) all else but it didn't miss a beat. So for months now, I've 403'd it re everything. Yet still it comes: lj910179.inktomisearch.com - - [11/Jun/2006:23:37:14 -0700] "GET /robots.txt HTTP/1.0" 403 803 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...] And as mentioned, from Day One, its info page has been inaccessible to the Chinese font- and language-challenged. If Slurp China hadn't been a Yahoo spawn, it would've been a goner six months ago. But because of its heritage, I tried to work with it, and/or around it. However, as a direct result of its behavior, nowadays I have far less tolerance for all of Yahoo's countless UAs/IPs/Hosts and their seemingly Yahoo-beneficial screw-ups. . A few more oddities for your compilation, GaryK: dcf1.labs.corp.yahoo.com demo03.labs.corp.yahoo.com search1.labs.corp.yahoo.com And one more curious lineage [webmasterworld.com] --
I show that it's been Slurping robots.txt as China for a while, and for months typically in 'pairs' with assorted .html files -- all of which were both generically and specifically Disallowed it in robots.txt (regular Slurp has robots.txt-specified access).
lj9083.inktomisearch.com - - [17/Nov/2005:02:23:05 -0800] "GET /file.html HTTP/1.0" 302 213 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]
lj9062.inktomisearch.com - - [14/Feb/2006:23:42:37 -0800] "GET /dir/file.html HTTP/1.0" 200 29109 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]
lj910053.inktomisearch.com - - [11/Jun/2006:23:37:21 -0700] "GET /dir/file.html HTTP/1.0" 403 803 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]
P.S.
NO UA
NO UA
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225