Forum Moderators: mack
Just saw this guy, fell into a spider trap:
131.107.137.47 - - [11/Apr/2003:01:31:08 -0600] "GET /a/deep/link.html HTTP/1.1" 200 12589 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"
No referer, came in on a deep link (like from a SE), and d/l pages but no images. After about 5 hits, he tried to grab a trap, and got banned. Grabbed a page every 5 secs or so...
IP resolves to Redmond.... did Bill just get himself banned?
dave
2003-04-24 10:40:05 131.107.163.47 - GET /robots.txt 200 11744 355 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -
2003-04-24 10:40:05 131.107.163.47 - GET /browsers/notes.asp 200 0 363 MicrosoftPrototypeCrawler+(How's+my+crawling?+mailto:newbiecrawler@hotmail.com) -
131.107.163.47 - - [18/Apr/2003:11:33:33 +0200] "GET /index.html HTTP/1.1" 404 6124 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:36:13 +0200] "GET /index.en.html HTTP/1.1" 404 6163 "-" "MicrosoftPrototypeCrawler (please
report obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:37:53 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
131.107.163.47 - - [18/Apr/2003:11:43:50 +0200] "GET /index.html HTTP/1.1" 404 6203 "-" "MicrosoftPrototypeCrawler (please rep
ort obnoxious behavior to newbiecrawler@hotmail.com)"
And then it falls into my e-mail harvester trap (maitlo links written mAilto):
131.107.163.47 - - [24/Apr/2003:00:06:41 +0200] "GET /guestbook/old/m& HTTP/1.1" 404 6277 "-" "MicrosoftPrototypeCrawler (How's my crawling? mailto:newbiecrawler@hotmail.com)"
If it wasn't for its interest in my Bill Gates page, I would have just said this was one of all the e-mail harvesting bots, but now I'm not so sure...
Let see.
It hits a site that talks about GOOGLE and MS SE.
It hits my site, which could be a competitor.
It hits a site that talks about MS UA’s.
It hits an anti-Uncle Bill site.
Now, I don’t work in R&D at Morton Thiokol, but I do do statistics. And I think I’m starting to see a trend here. However, I must admit that the sample size is much too small to have a real confidence level in any theories.
Anyone have a URL that it hasn’t hit where they could set-up a page about how MS does/has done this, that, or the utter? Or are there as many people without any MS references that it has come to on more than one occasion?
I may go back to the deny mode until it gets sorted out. Heck I haven’t sold anything to anyone on msn anyway.
[edited by: jim_w at 9:45 pm (utc) on April 25, 2003]
Jim_w I'm assuming that you've looked at the pages in Google's index containing the U/A string?
OK, if I understand the question, you mean IE UA’s? I was talking about pages with content.
If that wasn’t it, Huh? Remember it’s Friday and there is a higher probability for a human to make a mistake on Mondays and Fridays. at least that’s the theory I’m sticking to
OK, if I understand the question, you mean IE UA’s? I was talking about pages with content.If that wasn’t it, Huh?
Sorry, I wasn't trying to be cryptic. I meant if you search google for newbiecrawler@hotmail.com [google.com] you can see a number of pages hit by the spider unrelated to MS queries.