Forum Moderators: open

Message Too Old, No Replies

"NITLE Blog Spider/0.01

Has someone seen this spider?

         

hlag2000

10:24 am on May 23, 2003 (gmt 0)

10+ Year Member



Hallo,

I've found a new spider:

"NITLE Blog Spider/0.01"

didnt find anythig at google.

Has someone seen it befre?

Many greetings,

8-) klaus

volatilegx

1:44 pm on May 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can you give us the IP address?

hlag2000

2:08 pm on May 23, 2003 (gmt 0)

10+ Year Member



140.233.69.43

hlag2000

2:10 pm on May 23, 2003 (gmt 0)

10+ Year Member



OrgName: Middlebury College
OrgID: MIDDLE-1
Address: Information Technology Services, Voter
Address: Hall
City: Middlebury
StateProv: VT
PostalCode: 05443
Country: US

NetRange: 140.233.0.0 - 140.233.255.255
CIDR: 140.233.0.0/16
NetName: MIDDLEBURY
NetHandle: NET-140-233-0-0-1
Parent: NET-140-0-0-0-0
NetType: Direct Assignment
NameServer: LION.MIDDLEBURY.EDU
NameServer: CATAMOUNT.MIDDLEBURY.EDU
Comment:
RegDate: 1990-05-21
Updated: 2000-07-26

TechHandle: HM101-ARIN
TechName: McCausland, Howie
TechPhone: +1-802-443-5754
TechEmail: howie@middlebury.edu

volatilegx

4:41 pm on May 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I fired off an email to the the associate chair of the mathematics and computer science dept. at Middlebury college. Maybe he can shed some light on the spider :)

Woz

7:35 am on Jun 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did you ever hear anything back volatilegx? I just got hit with it as well, have to decide whether it is Good, Bad or Ugly.

Onya
Woz

lorax

4:28 pm on Jun 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just posted the director of IS at Middlebury. I'll see if I can get them to tell us what they're up to and why. It's right in my backyard - so to speak. ;)

lorax

6:09 pm on Jun 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just heard back from the IT Director. He's not personally aware of any such bot/spider and requested the IP. I've passed that along to him. Stay tuned.

lorax

6:13 pm on Jun 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ok, it seems to be a friendly. I can't speak to how well mannered it is but here's some background info. A quick note of thanks to Middlebury's IT staff for their help.

[nitle.org...]

lorax

7:14 pm on Jun 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From the author of the NITLE spider (with his permission to post here).

The bot you refer to (NITLE Blog Spider/0.01) is part of a weblog census being run by the National Institute for Technology and Liberal
Education (NITLE). We're a non-profit consortium of liberal arts colleges, funded by the Andrew Mellon Foundation. Middlebury is one of our member institutions, and since we happen to operate on their campus network, the crawl will appear to originate from the middlebury.edu domain.

The purpose of the blog census is twofold.

First, we're trying to identify and catalog as many active weblogs as possible across all languages, and make this data publically available (on the site at [blogcensus.net,...] which should go live this weekend). There are very few accurate statistics on weblogs available right now, particularly concerning non-English language communitties.

Second, we want to use this data as a test collection for our own work on search algorithms and information retrieval. This work is described in some detail at [nitle.org...] Since weblogs are a live collection with about a million documents, they make a good data set for learning to scale our algorithms.

I have tried to make sure our crawler respects the usual robots.txt exclusion rules, and does not hit any sites too hard. If I've made any programming errors that are causing the crawler to behave badly, please contact me at [my] email address mceglows@middlebury.edu, and I will work to rectify the problem.

Similarly, if you have any further questions about the blog crawl, don't hesitate to ask. I expect the blogcensus.net site to be up by Sunday afternoon [June 8, 2003].

Maciej Ceg owski (Mr.)
Lead Developer
Center for Educational Technology
Middlebury, VT 05753

volatilegx

6:28 pm on Jun 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks guys... I never did hear back on my email :)