Forum Moderators: open

Message Too Old, No Replies

crawler named Xanamu

can anyone id it

         

jec2002

3:34 am on Nov 30, 2002 (gmt 0)

10+ Year Member



orbital.23z.com - - [29/Nov/2002:17:30:23 -0800] "GET /robots.txt HTTP/1.0" 206 161 "-" "Xanamu/1.0"

I would appreciate any information on this crawler. A google shows nothing. Thanks.

jdMorgan

3:47 am on Nov 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jec2002,

Don't know that one, but you have a very interesting log entry there... It fetched robots.txt and your server responded with a 206-Partial Content response, meaning the request must have been a partial GET request. Very strange...

A search for 23z.com leads to what looks like some sort of underground "radio" station in the UK.

Jim

jec2002

5:08 am on Nov 30, 2002 (gmt 0)

10+ Year Member



I didn't notice the 206. Are you certain it means a partial request. I always presumed a 206 meant my server blinked or hiccuped (i.e., failed to execute the request). Stupid me. I'm not familiar with that at all. I know all the other http status codes, but I have no experience with a 206. What is a partial request? How can somebody make a partial request? As I understand it, you request a document, and you get the document. Please elaborate for me. Very grateful. Thanks.

I did a trace on 23z.com, and I got "Web Development Ltd." in the U.K. A lot of universities around London visit my site.

I would appreciate all further comments.

jdMorgan

5:16 am on Nov 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jec2002,

Paraphrasing from some notes I have:

A 206-Partial Content response means that the server has fulfilled the partial GET request for the resource. The request must have included a Range header field indicating the desired range. The response must include either a Content-Range header field indicating the range included with this response, or a multipart/byteranges Content-Type including Content-Range fields for each part. If multipart/byteranges is not used, the Content-Length header field in the response must match the actual number of octets transmitted in the message-body.

It's just unusual for a robots.txt to be requested with a range header.

Keep a close eye on that IP and User-agent.

Jim

jec2002

5:25 am on Nov 30, 2002 (gmt 0)

10+ Year Member



Thank you, jd. You were very helpful.

jdMorgan

5:31 am on Nov 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jec2002,

No problem! ...And welcome to WebmasterWorld!

Jim

wilderness

1:03 pm on Nov 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim,
I see 206's frequently. The majority of the time they accomapny the many PDF's I have. The server seems to hiccup when the either the visitors ISP or browser plug-in doesn't process the PDF fast enough.
I've had some PDF's on the same visit loaded 10 or more successive times.

However during this recent month I've had quite a strange and very spotty Verio visitor :-(
The IP is 128.242.197.101 and though they only vsisted perhaps 20-30 single entry/line visits during the month? Each visit resulted in a 206. None of these visits were for PDF's.
I'm quite confused by it and don't recall seeing it previously.

BTW here's a decent but brief link on error codes.
[members.tripod.com...]

bird

6:20 pm on Nov 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, I find it quite smart of a robot to make a partial request on robots.txt. That's a very simple method to avoid having to process multi-megabyte data in your robots.txt parser...

That doesn't really answer the original question, though. If in doubt, then I would block a spider coming from a web design company. The only legitimate situation for those would be if they ran some kind of directory you're listed in, but in that case the UA string is supposed to indicate that. In all other cases, they're probably just snooping out what other people are doing, eg. your keyword density and other things that may give you a competitive advantage.

jec2002

11:49 pm on Nov 30, 2002 (gmt 0)

10+ Year Member



I tend to think that the 206 was an insidious request, but I don't know enough to draw a knowledgeable conclusion. Could you elaborate on how they might be snooping. Thanks.

wilderness

2:55 am on Dec 1, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>Could you elaborate on how they might be snooping>

jec,
I relaize you were asking bird what he meant.
However I'm sure it was just his choice of words.

If you our I as webmasters look at another website similar to our own, than we are in fact "snooping." Even if we chase down SE referrals that come into our logs that could also be termed "snooping." Viewing others logs, robots, page structure on and on. . .
Anything investagative which enhances your knowledge and gains advantage to your own website could be termed such. By whatever method whether manually or by software.

tourist

5:25 am on Dec 1, 2002 (gmt 0)

10+ Year Member



Ok, I may be off-topic, but I am solving a small mystery. :)

Wilderness, 128.242.197.101? Why, that's good ol' Wordtracker at work... Just what work though, I'm not sure. ;)

They visit me, irregularly, and all visits during the last two months returned a 200 status code.

They do like to play with their UA, though... SOME of the examples from the last two months:

Mozilla/4.73 [en] (Win95; I)
Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; PeoplePC 1.0; Toshiba)
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; searchengine2000.com; sureseeker.com)
Mozilla/4.0 (compatible; MSIE 5.01; MSN 2.6; MSNIA; Windows 98)
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT; TUCOWS.COM)
Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; Hotbar 2.0)

bull

9:12 am on Dec 1, 2002 (gmt 0)

10+ Year Member



You can easily add any string to the MSIE UA.
See e.g. [winguides.com...]