http://www.webmasterworld.com Welcome to WebmasterWorld Guest from 38.103.63.18
register, login, search, glossary, subscribe, help, library, PubCon, announcements , recent posts, unanswered posts
PubCon Media Partner
Home / Forums Index / Microsoft / Microsoft Search Live
Forum Library : Charter : Moderators: Receptional

Microsoft Search Live

This 111 message thread spans 4 pages: < < 111 ( 1 [2] 3 4 > >   
Someone at MS just got banned!
Was Bill Gates Surfing My site?
pendanticist


#:1536734
 7:18 pm on April 24, 2003 (utc 0)

I've not heard any more from him at all, bobmark.

There were a couple of posts where someone eluded to 'the next big thing' (or similar) as though they perhaps knew something we don't.

Mr. Birney apparently did not see fit to post here, even though I sent him the thread and suggested he do so. We have communicated twice to date.

There are other mentions on the boards about MS going after Google and etc.

Logic dictates a certain amount of legitimacy especially when one considers how could an employee of MS obtain that IP Number and not get caught during the course of events, such as server draw running crawls without someone at MS tracking him down.

Then again, without Mr. Birney adding 'personal legitimacy' by posting here, tends to sway me the other way.

Having said that, since my domain hasn't been 'pummeled' too badly, I'm going to wait and see using cautious optimism.

Pendanticist.

jim_w


#:1536735
 8:55 pm on April 24, 2003 (utc 0)

pendanticist

There is a possibility that he disguised his browser type and changed his IP. Like I said, they may be competing with me soon. Just my luck.

131.107.65.225 - - [19/Apr/2003:17:10:03 -0500] "GET /links.html HTTP/1.1" 200 33341 "-" "Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.2;+.NET+CLR+1.1.4322)"

Notice the ‘+’s. So I have deny from 131.107.

wilderness


#:1536736
 9:06 pm on April 24, 2003 (utc 0)

Just the "surf nazi" checking here ;)

I've had them denied from their first visit and each time the IP expands? I'll expand my deny range.

It is NOT logical for a legitimate company like MS to disguise and even misrepresent themselves in such a manner.
It's just NOT good business.

So the "surf nazi" suggests letting them "eat 403's"

pendanticist


#:1536737
 9:38 pm on April 24, 2003 (utc 0)

Well then, he/she changed IP Numbers again?!?

Just ran 131.107.163.49 thru SpamCop and it renders this: postmaster@[131.107.163.49].

131.107.137.47 ditto postmaster@[131.107.137.47].
131.107.65.225 ditto postmaster@[131.107.65.225].

131.107.163.49 - - [23/Apr/2003:11:09:17 -0700] "GET /robots.txt HTTP/1.1" 200 220 "-" "MicrosoftPrototypeCrawler (How's my crawling? mailto:newbiecrawler@hotmail.com)"
131.107.163.49 - - [23/Apr/2003:11:09:17 -0700] "GET /blahblah.html HTTP/1.1" 200 8620 "-" "MicrosoftPrototypeCrawler (How's my crawling? mailto:newbiecrawler@hotmail.com)"
131.107.163.49 - - [23/Apr/2003:11:09:17 -0700] "GET /blahblah.html HTTP/1.1" 200 13642 "-" "MicrosoftPrototypeCrawler (How's my crawling? mailto:newbiecrawler@hotmail.com)"

It looks like this is the third IP Number and the second ' message '.

Hmmmmmm. Beginning to look more bogus all the way around.

Ok, then how do we go about shutting this thing down?

Fire off an abuse@msn/hotmail.com message?

If they're spoofing IP Numbers, (and I'm ignorant here) can't that be tracked down and reported? Or, are we simply looking at the .htaccess ban?

I've searched MS's MSDN finding nothing there and then Googled [url=http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=MicrosoftPrototypeCrawler]"MicrosoftPrototypeCrawler"[/url] shows only one more this week than it did last week and that's this thread.

Too bad there no one from here works for MS. <Hint! Hint!>

Pendanticist.

wilderness


#:1536738
 10:20 pm on April 24, 2003 (utc 0)

Pendanticist
My suggestion is to stop toying with the deciever and deny 131.107.
If MS doesn't have the proper regard in proyecting their subcribers? Whe should I?

Don

pendanticist


#:1536739
 10:46 pm on April 24, 2003 (utc 0)

If they're spoofing IP Numbers, (and I'm ignorant here) can't that be tracked down and reported? Or, are we simply looking at the .htaccess ban?

Even though I've banned ' deny from 131.107.', I'm still interested in learning more about tracking down spoof'd IP Numbers.

Must be time for another thread....

Pendanticist.

bnc929


#:1536740
 1:46 am on April 25, 2003 (utc 0)

I make it a habit to ban any robot that does not put down a valid contact method in the user agent (unless I know who they are). I don't consider a hotmail account to be valid. It managed to crawl 1700 of my pages before I got it though.

jim_w


#:1536741
 5:14 am on April 25, 2003 (utc 0)

Have you guys seen this mention of newbiecrawler on microdoc news?

Maybe I’m cynical, and no doubt I’m paranoid, (I grew up in the 70’s), and while it could be a new bot, that does not necessarily mean that it is a SE bot. It could be a spy bot just as well, or doing both. Spying while acting as SE bot or visa-versa.

Don’t get me wrong. I sell software written in a MS language, and have since 1990, and I always have been a pro-MS person, but, it looks funny and unethical to me. And lets face it, MS has been sued in the past on several questionable business practices.

I’m not even convinced that it is a bot all the time. Somewhere in my log the original IP that was posted came via google and subscribed to my newsletter, just as many of my competitors have in the past. Now I publish all the graphics for my newsletter on the server where my competitors are banned so all the get is the text until they get home. And they all have a REFERER of hotmail ,yahoo, etc. So I know it goes on.

can't that be tracked down and reported.

It can be just not very easily not to mention not economically. You need a sniffer and you need to sit on it 24/7. Banning is the most economical way I think.

Spoofing just doesn’t make sense. No reason to spoof to sign up for my newsletter. They could just do it via an ISP instead of going to all that trouble. Could it be a firewall thing adding to this confusing issue? I’ll bet it’s a new hire or something at MS, and they don’t realize that when they go out on the web with a MS IP, they are representing MS for better or for worse. i.e. a fresh-out or intern.

Chalupee


#:1536742
 10:45 am on April 25, 2003 (utc 0)

What MS sued... when did that happen? Im from the 60's and 70's spaced out and paranoid.
From the microdot message link above...
Assuming this new platform runs on Microsoft technology, there is going to be an interesting comparison between a Microsoft search engine and a Linux Search Engine (Google). Since we know Google has about 54,000 computers in what is a mammoth supercomputer made out of PC parts, it will be interesting to see how many NT Servers it takes to make a comparable search engine, or a better one than Google. ++

I think the name of the bot/crawler should be...
SwissCheese/madeinfrance...."Hack me, hack me.."

Chalupee
not from Gaudahlupee

wilderness


#:1536743
 12:00 pm on April 25, 2003 (utc 0)

I was going through some old saved IP and other inforamtion which I had saved for reference concerning IP identification and stumbled across the following (which I had from http ://www.clearwaterbeachcam.com/d--skinner/spiders.html, although the page is still there the referecnes below are not. My saved file is dated 03/25/02 ):

#MSN
#tide01.microsoft.com
#131.107.3.11
#tide02.microsoft.com
#131.107.3.12
#tide03.microsoft.com
#131.107.3.13
#tide04.microsoft.com
#131.107.3.14
#tide05.microsoft.com
#131.107.3.15
#tide06.microsoft.com
#131.107.3.16
#tide07.microsoft.com
#131.107.3.17
#tide08.microsoft.com
#131.107.3.18
#tide09.microsoft.com
#131.107.3.19
#tide10.microsoft.com
#131.107.3.20
#tide11.microsoft.com
#131.107.3.21
#tide12.microsoft.com
#131.107.3.22
#tide14.microsoft.com
#131.107.3.24
#tide15.microsoft.com
#131.107.3.25
#tide16.microsoft.com
#131.107.3.26
#tide17.microsoft.com
#131.107.3.27
#tide18.microsoft.com
#131.107.3.28
#tide19.microsoft.com
#131.107.3.29
#tide20.microsoft.com
#131.107.3.30
#tide21.microsoft.com
#131.107.3.31
#tide22.microsoft.com
#131.107.3.32
#tide23.microsoft.com
#131.107.3.33
#tide24.microsoft.com
#131.107.3.34
#tide25.microsoft.com
#131.107.3.35
#tide26.microsoft.com
#131.107.3.36
#tide27.microsoft.com
#131.107.3.37
#tide28.microsoft.com
#131.107.3.38
#tide29.microsoft.com
#131.107.3.39
#tide30.microsoft.com
#131.107.3.40
#tide33.microsoft.com
#131.107.39.12
#tide34.microsoft.com
#131.107.3.44
#tide35.microsoft.com
#131.107.3.45
#tide36.microsoft.com
#131.107.3.46
#tide70.microsoft.com
#131.107.3.70
#tide71.microsoft.com
#131.107.3.71
#tide72.microsoft.com
#131.107.3.72
#tide73.microsoft.com
#131.107.3.73
#tide74.microsoft.com
#131.107.3.74
#tide75.microsoft.com
#131.107.3.75
#tide76.microsoft.com
#131.107.3.76
#tide77.microsoft.com
#131.107.3.77
#tide78.microsoft.com
#131.107.3.78
#tide79.microsoft.com
#131.107.3.79
#tide82.microsoft.com
#131.107.3.82
#tide83.microsoft.com
#131.107.3.83
#tide84.microsoft.com
#131.107.3.84
#tide85.microsoft.com
#131.107.3.85
#tide86.microsoft.com
#131.107.3.86
#tide87.microsoft.com
#131.107.3.87
#tide93.microsoft.com
#131.107.3.93
#tide94.microsoft.com
#131.107.3.94
#tide110.microsoft.com
#63.64.43.138
#tide111.microsoft.com
#63.64.43.137
#tide112.microsoft.com
#208.249.151.138
#tide113.microsoft.com
#208.249.151.139
#tide114.microsoft.com
#192.237.67.205
#tide115.microsoft.com
#192.237.67.206
#tide116.microsoft.com
#207.46.104.80
#tide117.microsoft.com
#207.46.125.16
#tide118.microsoft.com
#208.147.66.138
#tide119.microsoft.com
#208.147.66.139
#tide120.microsoft.com
#207.46.71.10
#tide121.microsoft.com
#207.46.71.11
#tide122.microsoft.com
#203.127.3.12
#tide123.microsoft.com
#203.127.3.14
#tide124.microsoft.com
#203.41.151.8
#tide125.microsoft.com
#203.41.151.9
#tide130.microsoft.com
#207.46.36.9
#tide131.microsoft.com
#207.46.36.10
#tide132.microsoft.com
#207.46.36.11
#tide133.microsoft.com
#207.46.38.9
#tide134.microsoft.com
#207.46.38.10
#tide135.microsoft.com
#207.46.11.19
#tide136.microsoft.com
#207.46.11.20
#tide137.microsoft.com
#207.46.11.21
#tide138.microsoft.com
#207.46.44.9
#tide139.microsoft.com
#207.46.44.10
#tide140.microsoft.com
#207.46.46.9
#tide141.microsoft.com
#207.46.46.10
#tide142.microsoft.com
#207.46.40.9
#tide143.microsoft.com
#207.46.40.10
#tide144.microsoft.com
#207.46.48.9
#tide145.microsoft.com
#207.46.48.10
#tide146.microsoft.com
#207.46.42.9
#tide147.microsoft.com
#207.46.42.10

jim_w


#:1536744
 2:22 pm on April 25, 2003 (utc 0)

http ://www.clearwaterbeachcam.com/d--skinner/spiders.html

This is ironic, when I go there I end up at…
http://search.msn.com/results.aspx?srch=105&FORM=AS5&q=http+%3a%2f%2fwww.clearwaterbeachcam.com%2fd--skinner%2fspiders.html

(hehehehehehe)

#MSN

Does that mean the IP’s belong to msn.com and not microsoft.com? If that’s the case, I just banned all msn users. Oops. If some of that block is msn.com and not microsoft.com, that would explain a lot of this.

Does anyone know what msn.com IP’s are? Is there anyway of getting the IP block for anydomain.com?

#tide119.microsoft.com
#208.147.66.139

I get - Cable & Wireless

while it could be a new bot, that does not necessarily mean that it is a SE bot.

I should have said ‘THE SE BOT’.

wilderness


#:1536745
 3:11 pm on April 25, 2003 (utc 0)

Jim
If you go to " georgegg " page and use any one of those
(not sure what their called?)tide28.microsoft.com

You will see that they are still active (at least registered with MS.) whether that is Microsoft or MSN IMO, is really irrelavant.

I've removed the denies from 131.107. with "egg on my face"
having gone through Arin-Whois on all those ranges I'm in the process of allowing some of those MS IP ranges back into (from denied) to my FarEast blocks.

Most everybody realizes how over-bearing I am in these matters and I believe this should resolve this issue.
Although as Pendanticist points out, there still exists the possibility of it being a sppof'd range in our logs.
Due to the recent PERSISTENT activity (131.107) and the related ranges, I'm going to accept that chance.
Hopefully in the process I won't end up with even more egg on my face ;)

Don

<BTW Jim, that page comes right up for as soon as I omit the blank space I purposely left in the URL so the link would be broken>

jim_w


#:1536746
 3:35 pm on April 25, 2003 (utc 0)

Most everybody realizes how over-bearing I am in these matters

Yea, so am I. When you do a search for my keywords you see Motorola, GE, Honeywell, and I have had so many spy bots, that I get a little trigger happy sometimes. Banned a customer once even. (GRIN)

Pendanticist was right then. Complain to abuse@microsoft and abuse@msn.com and let them figure it out?

<BTW Jim, that page comes right up for as soon as I omit the blank space I purposely left in the URL so the link would be broken>

Yea, it was an attempt @ humor. Obviously a poor attempt! But once I hit Mr. Button, it was toooooooo late. :-))

Doesn’t msn do dynamic IP’s? If they do, then wouldn’t that make 131.107.137.47 microsoft.com because it was consistent? And I still have a problem with the ‘+’ thing that showed up.

wilderness


#:1536747
 3:57 pm on April 25, 2003 (utc 0)

<snip>Complain to abuse@microsoft</snip>

I stopped emailing IP's and backones some time ago. Generally your only response is automated. In the event you find somebody lucky enough to email with? They are not aware of any web log pattern nor, do they have the ability to comapre those patterns to their User Agreeements.
Their only concern is bandwith.

I'm not all the keen on the variations in UA's either. However just denying a visitor access because of UA with out comparing that to IP is TOOOO overbearing. IMO anyway.

These logs, like the internet are an always changing thing and though we are required perception? We should also remain open-minded. Hopefully creating a worthwhile balance of both which benefits both our websites and our visitors.
<off the soap box> ;)

Don

pendanticist


#:1536748
 4:26 pm on April 25, 2003 (utc 0)

I'm on the phone with MS at this very moment and it appears as though Mr. Birney is indeed an employee of theirs.

Be right back....

Pendanticist.

pendanticist


#:1536749
 4:50 pm on April 25, 2003 (utc 0)

Ok. I've spoken to a receptionist at tech and she is going to determine the legitimacy of this bot, once and for all.

She asked for my phone number and I gave her my ISP addy, so I do expect to hear from her.

I will let you know as soon as I hear anything at all.

(Thanks! to NeoTrace)

Pendanticist.

GoogleGuy


#:1536750
 5:08 pm on April 25, 2003 (utc 0)

Interesting saga--the hotmail address is a little strange. Keep us posted, pendanticist. :)

jns594


#:1536751
 6:21 pm on April 25, 2003 (utc 0)

All our sites got spidered yesterday by the same bot. Here is a sample of the IIS log file:

2003-04-24 20:08:56 131.107.163.50 - myserverip 80 GET /robots.txt - 404 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) - -

2003-04-24 20:08:57 131.107.163.50 - myserverip 80 GET /Default.asp - 200 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) - -

2003-04-24 20:09:24 131.107.163.50 - myserverip 80 GET /robots.txt - 404 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) - -

2003-04-24 20:09:24 131.107.163.50 - myserverip 80 GET /whatsnew.asp - 200 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) - -

kwngian


#:1536752
 6:22 pm on April 25, 2003 (utc 0)


I got hit too by this spider.

Is it a Microsoft loyalty probe?

Anyone running anything other than IIS?

jim_w


#:1536753
 6:47 pm on April 25, 2003 (utc 0)

Yea, I have a sun unix box.

GaryK


#:1536754
 7:07 pm on April 25, 2003 (utc 0)

It read my robots.txt file and then went right to a file where I have some not so nice things to say about various Microsoft-related user agents. Those are the only two files it read.

2003-04-24 10:40:05 131.107.163.47 - GET /robots.txt 200 11744 355 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -
2003-04-24 10:40:05 131.107.163.47 - GET /browsers/notes.asp 200 0 363 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -

pixel_juice


#:1536755
 7:34 pm on April 25, 2003 (utc 0)

The 2nd page it hit doesn't have too much good to say about a lot of user-agents, Gary ;)

I'm so curious about this bot. Haven't seen it on a site yet.

carfac


#:1536756
 7:58 pm on April 25, 2003 (utc 0)

Second the Unix Box thing.... FreeBSD and Apache.

dave

nafmo


#:1536757
 8:22 pm on April 25, 2003 (utc 0)
I've seen this too. It seems to be very interested in my mailing list archives, my guest book, and in my pages where I say some not-so-nice things about Bill Gates. First it came in without User-Agent or referer, which I then blocked. Then it tried to retrieve pages on URLs that have never ever existed, and are not linked from anywhere:

131.107.163.47 - - [18/Apr/2003:11:33:33 +0200] "GET /index.html HTTP/1.1" 404 6124 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:36:13 +0200] "GET /index.en.html HTTP/1.1" 404 6163 "-" "MicrosoftPrototypeCrawler (please
report obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:37:53 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:43:50 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"

And then it falls into my e-mail harvester trap (maitlo links written m&#65;ilto):

131.107.163.47 - - [24/Apr/2003:00:06:41 +0200] "GET /guestbook/old/m& HTTP/1.1" 404 6277 "-" "MicrosoftPrototypeCrawler (How's my crawling? mailto:newbiecrawler@hotmail.com)"

If it wasn't for its interest in my Bill Gates page, I would have just said this was one of all the e-mail harvesting bots, but now I'm not so sure...

jim_w


#:1536758
 9:21 pm on April 25, 2003 (utc 0)

Hummm, are we starting to see a correlation with the sites it hits? I think I maybe about ½ as paranoid as I think I am, but then again, I have been doing way too much thinking lately.

Let see.

It hits a site that talks about GOOGLE and MS SE.
It hits my site, which could be a competitor.
It hits a site that talks about MS UA’s.
It hits an anti-Uncle Bill site.

Now, I don’t work in R&D at Morton Thiokol, but I do do statistics. And I think I’m starting to see a trend here. However, I must admit that the sample size is much too small to have a real confidence level in any theories.

Anyone have a URL that it hasn’t hit where they could set-up a page about how MS does/has done this, that, or the utter? Or are there as many people without any MS references that it has come to on more than one occasion?

I may go back to the deny mode until it gets sorted out. Heck I haven’t sold anything to anyone on msn anyway.

[edited by: jim_w at 9:45 pm (utc) on April 25, 2003]

pixel_juice


#:1536759
 9:24 pm on April 25, 2003 (utc 0)

Jim_w I'm assuming that you've looked at the pages in Google's index containing the U/A string?

jim_w


#:1536760
 9:35 pm on April 25, 2003 (utc 0)

Jim_w I'm assuming that you've looked at the pages in Google's index containing the U/A string?

OK, if I understand the question, you mean IE UA’s? I was talking about pages with content.

If that wasn’t it, Huh? Remember it’s Friday and there is a higher probability for a human to make a mistake on Mondays and Fridays. at least that’s the theory I’m sticking to

pendanticist


#:1536761
 9:37 pm on April 25, 2003 (utc 0)

Or are there as many people without any MS references that it has come to on more than one occasion?

That'd be my site...albiet nomothetically.

Btw - still awaiting that call/e-mail. Being late afternoon on the East Coast, I don't think I'll hear anything until perhaps Monday.

Pendanticist.

bobmark


#:1536762
 11:18 pm on April 25, 2003 (utc 0)

Thanks, Pendanticist
parenthetically

pixel_juice


#:1536763
 1:51 am on April 26, 2003 (utc 0)

OK, if I understand the question, you mean IE UA’s? I was talking about pages with content.

If that wasn’t it, Huh?

Sorry, I wasn't trying to be cryptic. I meant if you [url=http://www.google.com/search?q=newbiecrawler%40hotmail.com]search google for newbiecrawler@hotmail.com[/url] you can see a number of pages hit by the spider unrelated to MS queries.

pendanticist


#:1536764
 2:08 am on April 26, 2003 (utc 0)

You know, pixel_juice? I did that same search a few days ago, yet dispite what 'Phoenix' (?) posts, I'm not so altogether sure that what he/she stated is based on any kind of fact.

I kinda think they just assumed the validity based solely on the bots appearance in their access_log files.

<shrug>

Pendanticist.

This 111 message thread spans 4 pages: < < 111 ( 1 [2] 3 4 > >
 

Home / Forums Index / Microsoft / Microsoft Search Live
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
WebmasterWorld ® and PubCon ® are a Registered Trademarks of WebmasterWorld Inc.
© WebmasterWorld Inc. / SearchEngineWorld 1996-2008 all rights reserved