| MS Search 4.0 Robot busy again What's the latest news on this chap? |
killroy

msg:1540667 | 10:40 am on Sep 6, 2003 (gmt 0) | Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft Has been all over my site pulling every last page. What is the latest news on this one? Is it in any way afilliated with MS? SN
|
Yidaki

msg:1540668 | 3:40 pm on Sep 6, 2003 (gmt 0) | Hmm... still unclear what the MS Search Robot [google.com] is ...!? In general i don't like standard ms user agents combined with strings like crawler or robot. I suppose it's something like the MSIECrawler [google.com] - worth to ban.
|
sidyadav

msg:1540669 | 4:26 am on Sep 7, 2003 (gmt 0) | Is it a mew Alias to the new MSNBot?
|
papamaku

msg:1540670 | 8:52 am on Sep 7, 2003 (gmt 0) | what ip/ips did it come in on?
|
killroy

msg:1540671 | 10:37 am on Sep 7, 2003 (gmt 0) | ip: 61.218.208.114 ua: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft all of em, zipped through the entire site. Could it be some spideer s/w that's downloadable? SN
|
claus

msg:1540672 | 10:51 am on Sep 7, 2003 (gmt 0) | That's just a cover-up, it's not the real MS spider from Microsoft. Here's who it is: inetnum: 61.216.0.0 - 61.219.255.255 netname: HINET-TW descr: CHTD, Chunghwa Telecom Co.,Ltd. descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd. descr: Taipei Taiwan 100 country: TW The User-Agent string is just something they make up - it's not possible to say exactly what spider software they use from this string, it could be anything including homebrew and even the real MS spider software on a licence from Microsoft (probable? i'd say no). And their spidering could have any purpose from legit SE activity to site-ripping - i don't really know much about the SE's in Taiwan. /claus
edit: clarified a bit
|
Yidaki

msg:1540673 | 11:55 am on Sep 7, 2003 (gmt 0) | >CHTD, Chunghwa Telecom Co.,Ltd. eMail harvester - who bets with me?! I had a lot of them on my pages recently (211.x.x.x and other UA's though).
|
sidyadav

msg:1540674 | 4:02 am on Sep 8, 2003 (gmt 0) | But claus , on this page: [webmasterworld.com...] it says: Sample log entries, slightly edited: example.com 207.46.137.9 - - [20/Aug/2002:03:28:44 +1000] "GET /foo/bar/1 HTTP/1.0" 200 38538 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)" example.com 131.107.3.85 - - [20/Aug/2002:03:28:50 +1000] "GET /foo/bar/2 HTTP/1.0" 200 50701 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)" example.com 131.107.3.83 - - [20/Aug/2002:03:29:05 +1000] "GET /foo/bar/3 HTTP/1.0" 200 38053 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)" |
| and I realised that the thire IP (131.107.3.83) is from Microsoft and if you HostName resolve it , Its from here: tide83.microsoft.com but the other IPs (first and second) can't resolve... May be something to look at!? Sid
|
claus

msg:1540675 | 9:51 am on Sep 8, 2003 (gmt 0) | Sidyadav, the User-Agent killroy posted is (a) the one that was posted by zem in the related thread is (b) (a) "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft" (b) "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)" There's some differences here. However, as User-Agent strings can always be forged (basically you can call it anything you want when you build a spider), the most important thing is always the IP-number. Zem and you reported real Microsoft IP's, while killroy reported an IP-number that belongs to a company in Taiwan. If the User-Agent is the same, the IP always wins. It is much harder to fake an IP-number than a User-Agent string. For bots/spiders it is even impossible to fake the IP, as the IP is the address the page is returned to when they request a page, so if they give away the wrong IP, they will never get the page back. It is true that a real MS spider has been out, and other threads confirm it. This spider is just not the same one that killroy saw, rather i think the Taiwanese company would like people to think it's the same, and that's why they chose a similar name. /claus
Added: I use a quite powerful Whois tool so i was able to look up the three IP's from your post. There's a web-based version of the tool here: [geektools.com...] 207.46.137.9: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA NetRange: 207.46.0.0 - 207.46.255.255 131.107.3.85: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA NetRange: 131.107.0.0 - 131.107.255.255 131.107.3.83: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA NetRange: 131.107.0.0 - 131.107.255.255 Anything coming from these IP-addresses is valid Microsoft traffic.
|
|
|