Welcome to WebmasterWorld Guest from 54.224.57.95

Forum Moderators: mack

Message Too Old, No Replies

MS Search 4.0 Robot busy again

What's the latest news on this chap?

     
10:40 am on Sep 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft

Has been all over my site pulling every last page.

What is the latest news on this one? Is it in any way afilliated with MS?

SN

3:40 pm on Sep 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm... still unclear what the MS Search Robot [google.com] is ...!?

In general i don't like standard ms user agents combined with strings like crawler or robot. I suppose it's something like the MSIECrawler [google.com] - worth to ban.

4:26 am on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it a mew Alias to the new MSNBot?
8:52 am on Sep 7, 2003 (gmt 0)

10+ Year Member



what ip/ips did it come in on?
10:37 am on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ip: 61.218.208.114
ua: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft

all of em, zipped through the entire site.

Could it be some spideer s/w that's downloadable?

SN

10:51 am on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's just a cover-up, it's not the real MS spider from Microsoft. Here's who it is:

inetnum: 61.216.0.0 - 61.219.255.255
netname: HINET-TW
descr: CHTD, Chunghwa Telecom Co.,Ltd.
descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd.
descr: Taipei Taiwan 100
country: TW

The User-Agent string is just something they make up - it's not possible to say exactly what spider software they use from this string, it could be anything including homebrew and even the real MS spider software on a licence from Microsoft (probable? i'd say no). And their spidering could have any purpose from legit SE activity to site-ripping - i don't really know much about the SE's in Taiwan.

/claus


edit: clarified a bit
11:55 am on Sep 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>CHTD, Chunghwa Telecom Co.,Ltd.

eMail harvester - who bets with me?!

I had a lot of them on my pages recently (211.x.x.x and other UA's though).

4:02 am on Sep 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



But claus , on this page:
[webmasterworld.com...]

it says:


Sample log entries, slightly edited:

example.com 207.46.137.9 - - [20/Aug/2002:03:28:44 +1000] "GET /foo/bar/1 HTTP/1.0" 200 38538 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.85 - - [20/Aug/2002:03:28:50 +1000] "GET /foo/bar/2 HTTP/1.0" 200 50701 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.83 - - [20/Aug/2002:03:29:05 +1000] "GET /foo/bar/3 HTTP/1.0" 200 38053 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"

and I realised that the thire IP (131.107.3.83) is from Microsoft and if you HostName resolve it , Its from here:

tide83.microsoft.com

but the other IPs (first and second) can't resolve...

May be something to look at!?

Sid

9:51 am on Sep 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sidyadav, the User-Agent killroy posted is (a) the one that was posted by zem in the related thread is (b)

(a) "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft"
(b) "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"

There's some differences here. However, as User-Agent strings can always be forged (basically you can call it anything you want when you build a spider), the most important thing is always the IP-number. Zem and you reported real Microsoft IP's, while killroy reported an IP-number that belongs to a company in Taiwan.

If the User-Agent is the same, the IP always wins. It is much harder to fake an IP-number than a User-Agent string. For bots/spiders it is even impossible to fake the IP, as the IP is the address the page is returned to when they request a page, so if they give away the wrong IP, they will never get the page back.

It is true that a real MS spider has been out, and other threads confirm it. This spider is just not the same one that killroy saw, rather i think the Taiwanese company would like people to think it's the same, and that's why they chose a similar name.

/claus


Added:

I use a quite powerful Whois tool so i was able to look up the three IP's from your post. There's a web-based version of the tool here: [geektools.com...]

207.46.137.9: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 207.46.0.0 - 207.46.255.255

131.107.3.85: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255

131.107.3.83: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255

Anything coming from these IP-addresses is valid Microsoft traffic.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month