Forum Moderators: mack

Message Too Old, No Replies

Someone at MS just got banned!

Was Bill Gates Surfing My site?

         

carfac

5:21 pm on Apr 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi:

Just saw this guy, fell into a spider trap:

131.107.137.47 - - [11/Apr/2003:01:31:08 -0600] "GET /a/deep/link.html HTTP/1.1" 200 12589 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"

No referer, came in on a deep link (like from a SE), and d/l pages but no images. After about 5 hits, he tried to grab a trap, and got banned. Grabbed a page every 5 secs or so...

IP resolves to Redmond.... did Bill just get himself banned?

dave

GaryK

7:07 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It read my robots.txt file and then went right to a file where I have some not so nice things to say about various Microsoft-related user agents. Those are the only two files it read.

2003-04-24 10:40:05 131.107.163.47 - GET /robots.txt 200 11744 355 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -
2003-04-24 10:40:05 131.107.163.47 - GET /browsers/notes.asp 200 0 363 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -

pixel_juice

7:34 pm on Apr 25, 2003 (gmt 0)

10+ Year Member



The 2nd page it hit doesn't have too much good to say about a lot of user-agents, Gary ;)

I'm so curious about this bot. Haven't seen it on a site yet.

carfac

7:58 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Second the Unix Box thing.... FreeBSD and Apache.

dave

nafmo

8:22 pm on Apr 25, 2003 (gmt 0)

10+ Year Member


I've seen this too. It seems to be very interested in my mailing list archives, my guest book, and in my pages where I say some not-so-nice things about Bill Gates. First it came in without User-Agent or referer, which I then blocked. Then it tried to retrieve pages on URLs that have never ever existed, and are not linked from anywhere:

131.107.163.47 - - [18/Apr/2003:11:33:33 +0200] "GET /index.html HTTP/1.1" 404 6124 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:36:13 +0200] "GET /index.en.html HTTP/1.1" 404 6163 "-" "MicrosoftPrototypeCrawler (please
report obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:37:53 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:43:50 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"

And then it falls into my e-mail harvester trap (maitlo links written mAilto):

131.107.163.47 - - [24/Apr/2003:00:06:41 +0200] "GET /guestbook/old/m& HTTP/1.1" 404 6277 "-" "MicrosoftPrototypeCrawler (How's my crawling? mailto:newbiecrawler@hotmail.com)"

If it wasn't for its interest in my Bill Gates page, I would have just said this was one of all the e-mail harvesting bots, but now I'm not so sure...

jim_w

9:21 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hummm, are we starting to see a correlation with the sites it hits? I think I maybe about ½ as paranoid as I think I am, but then again, I have been doing way too much thinking lately.

Let see.

It hits a site that talks about GOOGLE and MS SE.
It hits my site, which could be a competitor.
It hits a site that talks about MS UA’s.
It hits an anti-Uncle Bill site.

Now, I don’t work in R&D at Morton Thiokol, but I do do statistics. And I think I’m starting to see a trend here. However, I must admit that the sample size is much too small to have a real confidence level in any theories.

Anyone have a URL that it hasn’t hit where they could set-up a page about how MS does/has done this, that, or the utter? Or are there as many people without any MS references that it has come to on more than one occasion?

I may go back to the deny mode until it gets sorted out. Heck I haven’t sold anything to anyone on msn anyway.

[edited by: jim_w at 9:45 pm (utc) on April 25, 2003]

pixel_juice

9:24 pm on Apr 25, 2003 (gmt 0)

10+ Year Member



Jim_w I'm assuming that you've looked at the pages in Google's index containing the U/A string?

jim_w

9:35 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim_w I'm assuming that you've looked at the pages in Google's index containing the U/A string?

OK, if I understand the question, you mean IE UA’s? I was talking about pages with content.

If that wasn’t it, Huh? Remember it’s Friday and there is a higher probability for a human to make a mistake on Mondays and Fridays. at least that’s the theory I’m sticking to

pendanticist

9:37 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or are there as many people without any MS references that it has come to on more than one occasion?

That'd be my site...albiet nomothetically.

Btw - still awaiting that call/e-mail. Being late afternoon on the East Coast, I don't think I'll hear anything until perhaps Monday.

Pendanticist.

bobmark

11:18 pm on Apr 25, 2003 (gmt 0)

10+ Year Member



Thanks, Pendanticist
parenthetically

pixel_juice

1:51 am on Apr 26, 2003 (gmt 0)

10+ Year Member



OK, if I understand the question, you mean IE UA’s? I was talking about pages with content.

If that wasn’t it, Huh?

Sorry, I wasn't trying to be cryptic. I meant if you search google for newbiecrawler@hotmail.com [google.com] you can see a number of pages hit by the spider unrelated to MS queries.

This 111 message thread spans 12 pages: 111