Forum Moderators: mack
In general i don't like standard ms user agents combined with strings like crawler or robot. I suppose it's something like the MSIECrawler [google.com] - worth to ban.
inetnum: 61.216.0.0 - 61.219.255.255
netname: HINET-TW
descr: CHTD, Chunghwa Telecom Co.,Ltd.
descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd.
descr: Taipei Taiwan 100
country: TW
The User-Agent string is just something they make up - it's not possible to say exactly what spider software they use from this string, it could be anything including homebrew and even the real MS spider software on a licence from Microsoft (probable? i'd say no). And their spidering could have any purpose from legit SE activity to site-ripping - i don't really know much about the SE's in Taiwan.
/claus
it says:
Sample log entries, slightly edited:example.com 207.46.137.9 - - [20/Aug/2002:03:28:44 +1000] "GET /foo/bar/1 HTTP/1.0" 200 38538 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.85 - - [20/Aug/2002:03:28:50 +1000] "GET /foo/bar/2 HTTP/1.0" 200 50701 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.83 - - [20/Aug/2002:03:29:05 +1000] "GET /foo/bar/3 HTTP/1.0" 200 38053 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
and I realised that the thire IP (131.107.3.83) is from Microsoft and if you HostName resolve it , Its from here:
tide83.microsoft.com
but the other IPs (first and second) can't resolve...
May be something to look at!?
Sid
(a) "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft"
(b) "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
There's some differences here. However, as User-Agent strings can always be forged (basically you can call it anything you want when you build a spider), the most important thing is always the IP-number. Zem and you reported real Microsoft IP's, while killroy reported an IP-number that belongs to a company in Taiwan.
If the User-Agent is the same, the IP always wins. It is much harder to fake an IP-number than a User-Agent string. For bots/spiders it is even impossible to fake the IP, as the IP is the address the page is returned to when they request a page, so if they give away the wrong IP, they will never get the page back.
It is true that a real MS spider has been out, and other threads confirm it. This spider is just not the same one that killroy saw, rather i think the Taiwanese company would like people to think it's the same, and that's why they chose a similar name.
/claus
I use a quite powerful Whois tool so i was able to look up the three IP's from your post. There's a web-based version of the tool here: [geektools.com...]
207.46.137.9: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 207.46.0.0 - 207.46.255.255
131.107.3.85: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255
131.107.3.83: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255
Anything coming from these IP-addresses is valid Microsoft traffic.