Welcome to WebmasterWorld Guest from 54.196.35.120

Forum Moderators: mack

Message Too Old, No Replies

MS Search 4.0 Robot busy again

What's the latest news on this chap?

     
10:40 am on Sep 6, 2003 (gmt 0)

Senior Member from MT 

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 1, 2003
posts:1843
votes: 0



Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft

Has been all over my site pulling every last page.

What is the latest news on this one? Is it in any way afilliated with MS?

SN

3:40 pm on Sept 6, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 8, 2002
posts:2012
votes: 0


Hmm... still unclear what the MS Search Robot [google.com] is ...!?

In general i don't like standard ms user agents combined with strings like crawler or robot. I suppose it's something like the MSIECrawler [google.com] - worth to ban.

4:26 am on Sept 7, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 11, 2003
posts:955
votes: 0


Is it a mew Alias to the new MSNBot?
8:52 am on Sept 7, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 31, 2003
posts:194
votes: 0


what ip/ips did it come in on?
10:37 am on Sept 7, 2003 (gmt 0)

Senior Member from MT 

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 1, 2003
posts:1843
votes: 0


ip: 61.218.208.114
ua: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft

all of em, zipped through the entire site.

Could it be some spideer s/w that's downloadable?

SN

10:51 am on Sept 7, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2395
votes: 0


That's just a cover-up, it's not the real MS spider from Microsoft. Here's who it is:

inetnum: 61.216.0.0 - 61.219.255.255
netname: HINET-TW
descr: CHTD, Chunghwa Telecom Co.,Ltd.
descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd.
descr: Taipei Taiwan 100
country: TW

The User-Agent string is just something they make up - it's not possible to say exactly what spider software they use from this string, it could be anything including homebrew and even the real MS spider software on a licence from Microsoft (probable? i'd say no). And their spidering could have any purpose from legit SE activity to site-ripping - i don't really know much about the SE's in Taiwan.

/claus


edit: clarified a bit
11:55 am on Sept 7, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 8, 2002
posts:2012
votes: 0


>CHTD, Chunghwa Telecom Co.,Ltd.

eMail harvester - who bets with me?!

I had a lot of them on my pages recently (211.x.x.x and other UA's though).

4:02 am on Sept 8, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 11, 2003
posts:955
votes: 0


But claus , on this page:
[webmasterworld.com...]

it says:


Sample log entries, slightly edited:

example.com 207.46.137.9 - - [20/Aug/2002:03:28:44 +1000] "GET /foo/bar/1 HTTP/1.0" 200 38538 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.85 - - [20/Aug/2002:03:28:50 +1000] "GET /foo/bar/2 HTTP/1.0" 200 50701 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.83 - - [20/Aug/2002:03:29:05 +1000] "GET /foo/bar/3 HTTP/1.0" 200 38053 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"

and I realised that the thire IP (131.107.3.83) is from Microsoft and if you HostName resolve it , Its from here:

tide83.microsoft.com

but the other IPs (first and second) can't resolve...

May be something to look at!?

Sid

9:51 am on Sept 8, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2395
votes: 0


Sidyadav, the User-Agent killroy posted is (a) the one that was posted by zem in the related thread is (b)

(a) "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft"
(b) "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"

There's some differences here. However, as User-Agent strings can always be forged (basically you can call it anything you want when you build a spider), the most important thing is always the IP-number. Zem and you reported real Microsoft IP's, while killroy reported an IP-number that belongs to a company in Taiwan.

If the User-Agent is the same, the IP always wins. It is much harder to fake an IP-number than a User-Agent string. For bots/spiders it is even impossible to fake the IP, as the IP is the address the page is returned to when they request a page, so if they give away the wrong IP, they will never get the page back.

It is true that a real MS spider has been out, and other threads confirm it. This spider is just not the same one that killroy saw, rather i think the Taiwanese company would like people to think it's the same, and that's why they chose a similar name.

/claus


Added:

I use a quite powerful Whois tool so i was able to look up the three IP's from your post. There's a web-based version of the tool here: [geektools.com...]

207.46.137.9: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 207.46.0.0 - 207.46.255.255

131.107.3.85: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255

131.107.3.83: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255

Anything coming from these IP-addresses is valid Microsoft traffic.