|MS Search 4.0 Robot busy again|
What's the latest news on this chap?
Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft
Has been all over my site pulling every last page.
What is the latest news on this one? Is it in any way afilliated with MS?
Hmm... still unclear what the MS Search Robot [google.com] is ...!?
In general i don't like standard ms user agents combined with strings like crawler or robot. I suppose it's something like the MSIECrawler [google.com] - worth to ban.
Is it a mew Alias to the new MSNBot?
what ip/ips did it come in on?
ua: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft
all of em, zipped through the entire site.
Could it be some spideer s/w that's downloadable?
That's just a cover-up, it's not the real MS spider from Microsoft. Here's who it is:
inetnum: 126.96.36.199 - 188.8.131.52
descr: CHTD, Chunghwa Telecom Co.,Ltd.
descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd.
descr: Taipei Taiwan 100
The User-Agent string is just something they make up - it's not possible to say exactly what spider software they use from this string, it could be anything including homebrew and even the real MS spider software on a licence from Microsoft (probable? i'd say no). And their spidering could have any purpose from legit SE activity to site-ripping - i don't really know much about the SE's in Taiwan.
edit: clarified a bit
>CHTD, Chunghwa Telecom Co.,Ltd.
eMail harvester - who bets with me?!
I had a lot of them on my pages recently (211.x.x.x and other UA's though).
But claus , on this page:
Sample log entries, slightly edited:
example.com 184.108.40.206 - - [20/Aug/2002:03:28:44 +1000] "GET /foo/bar/1 HTTP/1.0" 200 38538 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 220.127.116.11 - - [20/Aug/2002:03:28:50 +1000] "GET /foo/bar/2 HTTP/1.0" 200 50701 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 18.104.22.168 - - [20/Aug/2002:03:29:05 +1000] "GET /foo/bar/3 HTTP/1.0" 200 38053 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
and I realised that the thire IP (22.214.171.124) is from Microsoft and if you HostName resolve it , Its from here:
but the other IPs (first and second) can't resolve...
May be something to look at!?
Sidyadav, the User-Agent killroy posted is (a) the one that was posted by zem in the related thread is (b)
(a) "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft"
(b) "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
There's some differences here. However, as User-Agent strings can always be forged (basically you can call it anything you want when you build a spider), the most important thing is always the IP-number. Zem and you reported real Microsoft IP's, while killroy reported an IP-number that belongs to a company in Taiwan.
If the User-Agent is the same, the IP always wins. It is much harder to fake an IP-number than a User-Agent string. For bots/spiders it is even impossible to fake the IP, as the IP is the address the page is returned to when they request a page, so if they give away the wrong IP, they will never get the page back.
It is true that a real MS spider has been out, and other threads confirm it. This spider is just not the same one that killroy saw, rather i think the Taiwanese company would like people to think it's the same, and that's why they chose a similar name.
I use a quite powerful Whois tool so i was able to look up the three IP's from your post. There's a web-based version of the tool here: [geektools.com...]
126.96.36.199: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 188.8.131.52 - 184.108.40.206
220.127.116.11: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 18.104.22.168 - 22.214.171.124
126.96.36.199: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 188.8.131.52 - 184.108.40.206
Anything coming from these IP-addresses is valid Microsoft traffic.