homepage Welcome to WebmasterWorld Guest from 184.72.82.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
MS Search 4.0 Robot busy again
What's the latest news on this chap?
killroy




msg:1540667
 10:40 am on Sep 6, 2003 (gmt 0)


Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft

Has been all over my site pulling every last page.

What is the latest news on this one? Is it in any way afilliated with MS?

SN

 

Yidaki




msg:1540668
 3:40 pm on Sep 6, 2003 (gmt 0)

Hmm... still unclear what the MS Search Robot [google.com] is ...!?

In general i don't like standard ms user agents combined with strings like crawler or robot. I suppose it's something like the MSIECrawler [google.com] - worth to ban.

sidyadav




msg:1540669
 4:26 am on Sep 7, 2003 (gmt 0)

Is it a mew Alias to the new MSNBot?

papamaku




msg:1540670
 8:52 am on Sep 7, 2003 (gmt 0)

what ip/ips did it come in on?

killroy




msg:1540671
 10:37 am on Sep 7, 2003 (gmt 0)

ip: 61.218.208.114
ua: Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft

all of em, zipped through the entire site.

Could it be some spideer s/w that's downloadable?

SN

claus




msg:1540672
 10:51 am on Sep 7, 2003 (gmt 0)

That's just a cover-up, it's not the real MS spider from Microsoft. Here's who it is:

inetnum: 61.216.0.0 - 61.219.255.255
netname: HINET-TW
descr: CHTD, Chunghwa Telecom Co.,Ltd.
descr: Data-Bldg.6F, No.21, Sec.21, Hsin-Yi Rd.
descr: Taipei Taiwan 100
country: TW

The User-Agent string is just something they make up - it's not possible to say exactly what spider software they use from this string, it could be anything including homebrew and even the real MS spider software on a licence from Microsoft (probable? i'd say no). And their spidering could have any purpose from legit SE activity to site-ripping - i don't really know much about the SE's in Taiwan.

/claus


edit: clarified a bit
Yidaki




msg:1540673
 11:55 am on Sep 7, 2003 (gmt 0)

>CHTD, Chunghwa Telecom Co.,Ltd.

eMail harvester - who bets with me?!

I had a lot of them on my pages recently (211.x.x.x and other UA's though).

sidyadav




msg:1540674
 4:02 am on Sep 8, 2003 (gmt 0)

But claus , on this page:
[webmasterworld.com...]

it says:


Sample log entries, slightly edited:

example.com 207.46.137.9 - - [20/Aug/2002:03:28:44 +1000] "GET /foo/bar/1 HTTP/1.0" 200 38538 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.85 - - [20/Aug/2002:03:28:50 +1000] "GET /foo/bar/2 HTTP/1.0" 200 50701 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"
example.com 131.107.3.83 - - [20/Aug/2002:03:29:05 +1000] "GET /foo/bar/3 HTTP/1.0" 200 38053 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"

and I realised that the thire IP (131.107.3.83) is from Microsoft and if you HostName resolve it , Its from here:

tide83.microsoft.com

but the other IPs (first and second) can't resolve...

May be something to look at!?

Sid

claus




msg:1540675
 9:51 am on Sep 8, 2003 (gmt 0)

Sidyadav, the User-Agent killroy posted is (a) the one that was posted by zem in the related thread is (b)

(a) "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 4.0 Robot) Microsoft"
(b) "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot)"

There's some differences here. However, as User-Agent strings can always be forged (basically you can call it anything you want when you build a spider), the most important thing is always the IP-number. Zem and you reported real Microsoft IP's, while killroy reported an IP-number that belongs to a company in Taiwan.

If the User-Agent is the same, the IP always wins. It is much harder to fake an IP-number than a User-Agent string. For bots/spiders it is even impossible to fake the IP, as the IP is the address the page is returned to when they request a page, so if they give away the wrong IP, they will never get the page back.

It is true that a real MS spider has been out, and other threads confirm it. This spider is just not the same one that killroy saw, rather i think the Taiwanese company would like people to think it's the same, and that's why they chose a similar name.

/claus


Added:

I use a quite powerful Whois tool so i was able to look up the three IP's from your post. There's a web-based version of the tool here: [geektools.com...]

207.46.137.9: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 207.46.0.0 - 207.46.255.255

131.107.3.85: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255

131.107.3.83: Microsoft Corp, MSFT, One Microsoft Way, Redmond, WA
NetRange: 131.107.0.0 - 131.107.255.255

Anything coming from these IP-addresses is valid Microsoft traffic.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved