Someone at MS just got banned!

Forum Moderators: mack

Message Too Old, No Replies

Someone at MS just got banned!

Was Bill Gates Surfing My site?

carfac

5:21 pm on Apr 11, 2003 (gmt 0)

Hi:

Just saw this guy, fell into a spider trap:

131.107.137.47 - - [11/Apr/2003:01:31:08 -0600] "GET /a/deep/link.html HTTP/1.1" 200 12589 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"

No referer, came in on a deep link (like from a SE), and d/l pages but no images. After about 5 hits, he tried to grab a trap, and got banned. Grabbed a page every 5 secs or so...

IP resolves to Redmond.... did Bill just get himself banned?

dave

GaryK

7:07 pm on Apr 25, 2003 (gmt 0)

It read my robots.txt file and then went right to a file where I have some not so nice things to say about various Microsoft-related user agents. Those are the only two files it read.

2003-04-24 10:40:05 131.107.163.47 - GET /robots.txt 200 11744 355 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -
2003-04-24 10:40:05 131.107.163.47 - GET /browsers/notes.asp 200 0 363 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -

pixel_juice

7:34 pm on Apr 25, 2003 (gmt 0)

The 2nd page it hit doesn't have too much good to say about a lot of user-agents, Gary ;)

I'm so curious about this bot. Haven't seen it on a site yet.

carfac

7:58 pm on Apr 25, 2003 (gmt 0)

Second the Unix Box thing.... FreeBSD and Apache.

dave

nafmo

8:22 pm on Apr 25, 2003 (gmt 0)

I've seen this too. It seems to be very interested in my mailing list archives, my guest book, and in my pages where I say some not-so-nice things about Bill Gates. First it came in without User-Agent or referer, which I then blocked. Then it tried to retrieve pages on URLs that have never ever existed, and are not linked from anywhere:

131.107.163.47 - - [18/Apr/2003:11:33:33 +0200] "GET /index.html HTTP/1.1" 404 6124 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:36:13 +0200] "GET /index.en.html HTTP/1.1" 404 6163 "-" "MicrosoftPrototypeCrawler (please
report obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:37:53 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:43:50 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"

And then it falls into my e-mail harvester trap (maitlo links written mAilto):

131.107.163.47 - - [24/Apr/2003:00:06:41 +0200] "GET /guestbook/old/m& HTTP/1.1" 404 6277 "-" "MicrosoftPrototypeCrawler (How's my crawling? mailto:newbiecrawler@hotmail.com)"

If it wasn't for its interest in my Bill Gates page, I would have just said this was one of all the e-mail harvesting bots, but now I'm not so sure...

jim_w

9:21 pm on Apr 25, 2003 (gmt 0)

Hummm, are we starting to see a correlation with the sites it hits? I think I maybe about � as paranoid as I think I am, but then again, I have been doing way too much thinking lately.

Let see.

It hits a site that talks about GOOGLE and MS SE.
It hits my site, which could be a competitor.
It hits a site that talks about MS UA�s.
It hits an anti-Uncle Bill site.

Now, I don�t work in R&D at Morton Thiokol, but I do do statistics. And I think I�m starting to see a trend here. However, I must admit that the sample size is much too small to have a real confidence level in any theories.

Anyone have a URL that it hasn�t hit where they could set-up a page about how MS does/has done this, that, or the utter? Or are there as many people without any MS references that it has come to on more than one occasion?

I may go back to the deny mode until it gets sorted out. Heck I haven�t sold anything to anyone on msn anyway.

[edited by: jim_w at 9:45 pm (utc) on April 25, 2003]

pixel_juice

9:24 pm on Apr 25, 2003 (gmt 0)

Jim_w I'm assuming that you've looked at the pages in Google's index containing the U/A string?

jim_w

9:35 pm on Apr 25, 2003 (gmt 0)

Jim_w I'm assuming that you've looked at the pages in Google's index containing the U/A string?

OK, if I understand the question, you mean IE UA�s? I was talking about pages with content.

If that wasn�t it, Huh? Remember it�s Friday and there is a higher probability for a human to make a mistake on Mondays and Fridays. at least that�s the theory I�m sticking to

pendanticist

9:37 pm on Apr 25, 2003 (gmt 0)

Or are there as many people without any MS references that it has come to on more than one occasion?

That'd be my site...albiet nomothetically.

Btw - still awaiting that call/e-mail. Being late afternoon on the East Coast, I don't think I'll hear anything until perhaps Monday.

Pendanticist.

bobmark

11:18 pm on Apr 25, 2003 (gmt 0)

Thanks, Pendanticist
parenthetically

pixel_juice

1:51 am on Apr 26, 2003 (gmt 0)

OK, if I understand the question, you mean IE UA�s? I was talking about pages with content.
If that wasn�t it, Huh?

Sorry, I wasn't trying to be cryptic. I meant if you search google for newbiecrawler@hotmail.com [google.com] you can see a number of pages hit by the spider unrelated to MS queries.

This 111 message thread spans 12 pages: 111