Forum Moderators: open

Message Too Old, No Replies

Zao/0.1

Another resurrected spyder?

         

pendanticist

7:22 pm on Apr 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



133.11.36.25 - - [30/Apr/2003:11:58:40 -0700] "GET / HTTP/1.1" 200 20114 "-" "Zao/0.1 (http*//www.kototoi.org/zao/)"

Mentioned in this blocked thread [webmasterworld.com] from Oct 29, 2002.

The site <Last updated on July 18, 2002> says it's purpose is for studying how to 'collect documents' and how to 'extract information out of the collected documents'. I have to wonder how they define 'documents'.

I take it this bot hasn't been around for awhile, or just that it hasn't visited me before?

My work is more along the lines of a directory and I was wondering how they'd equate a bunch of links to their interpretations of what the bots job is. Links don't seem to be the same as 'documents', unless I'm missing something.

Pendanticist.

wilderness

3:22 pm on May 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Pendanticist
If you remove the zao from the link and travel to the root.
There exists a link to their homepage.
I didn't see any SE offering. Rather a membership and publication list.
Can't see how this would be beneficial to a website.
Especially one such as myself not being intereasted in APNIC traffic.

pendanticist

1:30 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see what you mean wilderness. This thing is of no value to me at all.

<lemme see, where'd I put that .htaccess file. I know it's around here somewhere....>

Pendanticist.

DavidT

12:19 pm on May 17, 2003 (gmt 0)

10+ Year Member



Zao appears to obey robots.txt disallow so not sure you need to unearth your .htaccess file pendanticist.

volatilegx

8:35 pm on May 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a note... here is what I have on this bot:

# UA "Zao/0.1 (http://www.kototoi.org/zao/)"
133.11.36.42
133.11.36.46
133.11.36.50
133.11.36.54

fiestagirl

11:16 pm on May 19, 2003 (gmt 0)

10+ Year Member



I've got these so far.
133.11.36.28
133.11.36.36
133.11.36.37
133.11.36.39
133.11.36.41
133.11.36.42
133.11.36.46
133.11.36.50
133.11.36.52
133.11.36.54

resolves to "tsubame10.crawler.kototoi.org"
Looks like they own 0-63.

IITian

4:30 am on Jun 3, 2003 (gmt 0)

10+ Year Member



I got this today. Interesting thing is that my site got it just an hour or so after I had visited a Japanese site searching for an old colleague of mine who got his PhD from Univ. of Tokyo and is a database expert in Japan. Just a co-incidence perhaps since this spider seems to be a Univ. of Tokyo project!

creative craig

8:11 am on Jun 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I got hit pretty hard over night by Zao as well, I'll add them to the robots.txt if they obey :)

Craig

DavidT

12:33 pm on Jun 4, 2003 (gmt 0)

10+ Year Member



It has obeyed my disallow, only it keeps taking robots.txt a little too often for my liking.

jrobbio

4:06 pm on Jun 4, 2003 (gmt 0)

10+ Year Member



133.11.36.25 - - [04/Jun/2003:07:16:17 +0100] "GET /robots.txt HTTP/1.1" 200 4416 "-" "Zao/0.1 (http://www.kototoi.org/zao/)"
133.11.36.25 - - [04/Jun/2003:07:23:11 +0100] "GET / HTTP/1.1" 200 5502 "-" "Zao/0.1 (http://www.kototoi.org/zao/)"

Reverse DNS gave - hibari01.crawler.kototoi.org

The referring page was very good and informative. It will be interesting to see what happens with this.

IITian

1:40 pm on Jun 8, 2003 (gmt 0)

10+ Year Member



It grabbed quite a few files from my site yesterday. I am watching it with interest.

claus

8:09 am on Jul 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i'll add this one:

133.11.36.34

- got hit yesterday, a very slow crawler, 10 to 20 minutes between fetching each page - no referrers, using GET and grabbing it all.

First file request was robots.txt, then directly off to a level 2 index page, to a level 3 page in same group, but not via link from former page, and then a level 3 page in another group. Main index page was not requested at all.

(the levels and groups follows Bretts site model - i've discovered that this site actually follows that model, althought i didn't know the model until a few days ago)

/claus

DavidT

8:59 am on Jul 1, 2003 (gmt 0)

10+ Year Member



Since implenting the ideas about reducing size of robots.txt here:

[webmasterworld.com...]

specifically the part about reducing the instance of 'Disallow: /' to one and simply listing the bots included in the disallow Zao no longer seems to get it so it's gone into the ban bin in htaccess.

upside

10:58 pm on Jul 2, 2003 (gmt 0)

10+ Year Member



I also have banned Zao. Too many "research projects" using my resources without recompense these days.