homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

This 111 message thread spans 4 pages: 111 ( [1] 2 3 4 > >     
Someone at MS just got banned!
Was Bill Gates Surfing My site?

 5:21 pm on Apr 11, 2003 (gmt 0)


Just saw this guy, fell into a spider trap: - - [11/Apr/2003:01:31:08 -0600] "GET /a/deep/link.html HTTP/1.1" 200 12589 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)"

No referer, came in on a deep link (like from a SE), and d/l pages but no images. After about 5 hits, he tried to grab a trap, and got banned. Grabbed a page every 5 secs or so...

IP resolves to Redmond.... did Bill just get himself banned?




 5:32 pm on Apr 11, 2003 (gmt 0)

this is another number in the lower block of that 131 range. The other day I expanded upwards to prevent future expansion.
deny from 131.107.



 11:56 pm on Apr 11, 2003 (gmt 0)

I've seen an IP try to continually snag a few pages every hour on my main site. The activity pattern suggested it was a bursty (hits appeared in clusters or bursts around the last quarter of each hour from what I remember). For a while, I wasn't sure if it was a bot or a Microsoftie. Since it did not present any useragent, I just banned it.



 1:11 am on Apr 12, 2003 (gmt 0)


I would say that pretty much clinches it as a bot.... on my site, d/l a bunch of HTML and NO Images is WEIRD (My site is PICTURES of WIDGETS!)


You blocked the whole IP- 131.107.xxx.xxx?

What do you think they are doing?

Not that I am all surprised that a 'bot out of Redmond is not polite (that is, respects robots.txt), but you would think MS of ALL people would put every possible doo-hicky and thingamabob into something they write. So, do you think the "Redmond Robot" is about 20 megs in source code size? :)



 1:30 am on Apr 12, 2003 (gmt 0)

<snip>Wilderness! You blocked the whole IP- 131.107.xxx.xxx?</snip>

:) :)
I might better appreciate your despair if I went all the way up to 131. ;)

It's doesn't much matter to me what they (whether its MS or somebody falsely representing themselves as such) are doing?
For me the determining factor is three fold;
1) First they began visits with both referer and ua blank.
2) When the denies began as a result of their actions in line one above, they changed to a UA to get around that. While still not providing if they are a MS bot or providing a link back to the bot which gives us an answer we desire.
3) now they change IP's

I've learned time and again that every time I deny short that it comes back to haunt me. In this instance the 131.107. may even be short :(



 2:20 am on Apr 12, 2003 (gmt 0)

>>> appreciate your despair if I went all the way up to 131


Do you run a spider trap? For me, that has worked really well. I do not want to get too much into it here, but I run a few different versions. The overlap works great, and it catches them in process, not after the fact- I guess that is why I do not go for blocking whole C, B or -GASP- A blocks. I do, on occasion, when traffic is all over an IP range, and I know I do not have to care at all about the range (read Maylasia, Cybervalence, IA, etc)...



 3:04 am on Apr 12, 2003 (gmt 0)

I've considered a trap. Jim has attepmted to sway me more than a few times and it would likely save me a bunch of time. I may eventually.
I have sort of a crummy trap which shows in the logs which caught somebody the second day it was in and it's only on two of my pages :)
I stumbled across the thing while browsing for something else.

I'm sure there are some bots I haven't seen and perhaps never will due to both the content of my sites and the narrow market. Most of the other malicious ones are already denied.

Thanks for the hint :-)


 5:32 am on Apr 12, 2003 (gmt 0)


Well, between Jim and I, I think we have perfected the trap! He comes up with a new idea.... then I add something else.... we have it so it is pretty darn foolproof now. And it gets a lot of what gets past the IP and UA blocks. But then there are some that get by that, too.... and I catch in a bandwidth or CPU throttle. If they get by all that- and I just discovered one that did!- they deserve to get whatever they can! (Kidding)

Jim is a bit more cautious than I am in regards to the trap... I am a bit more, uh, proactive. I am ALWAYS banning Ask Jeeves (which is a very poorly behaved spider), and I know Jim makes allowances for that one.

Anyway, I just see it as another line of defense, and I would reccomend you do it!



 6:08 am on Apr 12, 2003 (gmt 0)

another line of defense

Against what?

Are you defending your security or three-cents-worth of bandwidth?


 12:47 pm on Apr 12, 2003 (gmt 0)

<snip>your security or three-cents-worth of bandwidth?</snip>

You have a more varied particpation than Dave and myself in these forumns.
Not sure how you either miss or not understand the concept or method?

Each webmaster makes a determination as part of the goals for their website on visitors and use of their content. In the end it's the overall scheme of things rather than a solitary portion, whether it's pennies or buttons ;)

"My bandwidth" rather than defining pennies might better be interpetd as boundaries.


 5:03 pm on Apr 12, 2003 (gmt 0)

Key martini!

>>>> security or three-cents-worth of bandwidth

I am pretty happy with the security on my site, that is not too much of an issue for me. I have a firewall that blocks ALL but port 80 (and one other port, but ONLY if the request is from my IP). My cgi is secure, and I do NOT soley rely on .htaccess password protection. So I am pretty happy there.

My pages do 2-3 gigs a day in bandwidth. The guy I took down yesterday did 79.60 MB (directly from AWStats)... he was not caught by all the bells and whistles (and bait) I throw out. That number APROX equals the bandwidth Google has used month to date on my site (It is within 2-3 megs).

My site also has a lot of really good (some very rare!) pictures of widgets on it. My widget pix are ALWAYS turning up in forums (as avatars, or just to illustrate a point). One of my sites- which sells widgets- is always having it's widget pix show up on e-Gag, too. I am sure THAT bandwidth, unchecked, would exceed pennies a day.

But my main site- the NPO- is an informational site, and is over 500,000 pages long. This one was a Yahoo site-of-the-day a couple months ago (great PR boost, btw!) It sells NOTHING, and only receives income from donations and banners on the site. One, as it's "product" is information, I need to protect that product. If you sell something, you are able to proect yourselves from theives and such, to the extent and in the manner you see fit. Blocking site d/lers is how one does protect their information product. Also, since it exists and is paid for by banner ads, I want/need people to see those on my site, not to d/l the information and leave me with nothing for my work making the site.

If I had none of this in place, I am sure I would be looking at upwards of a gig a day extra in bandwidth. As it is, I always see some stupid robot chuck down 500-1000 403's before they get wise. I wonder what that bandwidth would be if they were getting real pages (at 12-15k each) rather than a 1.2k 403 page? (Yes, I can do the math!)

I agree- things like requests for default.ida are a minor annoyance.... and I deal with them (in a different way) while I go after this bigger site d/lers and other bots. But I ALSO have another reason for actively blocking Bots!

I have been the victim of Nameprotect/Cybervallence. Not just the bandwidth drain... but an actual pending lawsuit based on my "possible" trademark infringement. And it was TOTALLY absurd! COMPLETELY! But, I HAD to hire a trademark attorney to defend myself. I would consider that money directly lost due to NOT blocking a spider!

So, you see, I have my reasons- and some quite good- for doing this. I think, to me SPECIFICALLY- it is more than pennies a day. My main site is a NPO, and does accept tax-deductable donations... you are welcome to become a member for only pennies a day! :)

I think it is hard to know how much these rogue spiders drain from an individual site until you block and log them and see for yourself. I know that I have much more of a problem that say Jim Morgan (number-wise), and so I would say it varies GREATLY depending on the website. And I do not think I am exagerationg at all to say my bandwidth would be 25% higher (at least) w/o blocking.

I must say that this forum has made me much more zealous about this topic... I have gone from thinking I am presenting a website to the world, to thinking this is MY private property, and I need to protect it as such. So, I do what I can.

Besides, it is so easy to do!



 5:31 pm on Apr 12, 2003 (gmt 0)

So, you see, I have my reasons- and some quite good- for doing this.

Hi Dave! Good answer, and I agree with your point entirely. I think some people (not anybody who posted in this thread, though!), can get carried away by the spider hunt...

Thanks for your illuminating comments (I'm ususally not illuminated before 5 pm, ya' know)!

:) Y


 6:03 pm on Apr 12, 2003 (gmt 0)


>>> I agree with your point entirely

I was thinking you might, once I made my case. :)

But I agree also that each individual webmaster needs to make the decisions for their specific website, too, and to what extent they need to ban (if at all). I am doing what is good for my site. Don and I disagree often with the extent of a ban, but we both learn from each other, too. Same with Jim and I. But I figure if we keep throwing the info out here, it will help some people. I know I have gotten more than a few PM's about how to ban this or that, and I am more than happy to help out.

>>> I think some people (not anybody who posted in this thread, though!), can get carried away by the spider hunt

I disagree... I HAVE gotten overzealous myself! And have had to cut back a bit. Banned myself once (damn!). I also once misunderstood the extent of a ban on "_vti_bin" (or something like that).

If one is going to ban, one must also look carefully at what you are banning! You HAVE to read those logs, and make adjustments. Just because I- or Don, or Jim, or Martinibuster- say we spotted something and WE are going to ban it, doesn't mean everyone should. That is why I always post the IP, the UA and what it did... I do not bother banning bots that only ask for robots.txt and move on... for ME, that is a waste of time. For others, that is the first indicatiuon of a pending attack, and they act accordingly for their website

So it is GREAT advice to temper ALL ouradvice (from whoever) with a grain of salt, and see how it fits in with the goals and trafic for YOUR website!



 2:12 pm on Apr 15, 2003 (gmt 0)

I had a visit from, but they came via a search engine, then the bot started. As a matter of fact, this person signed up for my newsletter and used their MS email addy before the bot started. Like less than an hour.

Now since MS has a new product that could compete with my product, I silently removed their name from my newsletter list and banned the bot till it stopped. Don't want to ban MS just incase they want to license our stuff.

Also just FYI, I went to a big ISP and read their user policy and found out that at least some give users a static IP for DSL. So I have banned bots from DSL on those ISPís by just using the IP they cam in on in itís entirely.


 2:35 pm on Apr 15, 2003 (gmt 0)


Wow- this guy is a busy little beaver! Could you possibly PM me the e-mail... I want to see if he signed up for MY newsletter, too.

>>> at least some give users a static IP for DSL

I have one, but I had to ask for it. I have telnet/ssh denied at the firewalll for all but my ONE IP... is that security or what?



 2:57 pm on Apr 15, 2003 (gmt 0)

Could you possibly PM me the e-mail...

I can't it would violate the policy I have stated on my web site and that could hurt my integrity, but I will say they used @microsoft.com so if they did sign up for your newsletter, it should be easy to find. Also just check your log for that IP and your list script running. That should tell you.


 4:22 am on Apr 16, 2003 (gmt 0)


Fair enough.

OK, lets look.

Nope- no one at all at microsoft.com

(Should have thought to do that myself!)



 1:46 am on Apr 18, 2003 (gmt 0)

They have put a name on the bot now, MicrosoftPrototypeCrawler: ... "MicrosoftPrototypeCrawler (please report obnoxious behavior to newbiecrawler@hotmail.com)"

Anyone can get a hotmail address, but the ip-address is owned by Microsoft.

He is crawling a site here now at about one page every 5-10 minutes always reading /robots.txt and then the page.


 5:39 am on Apr 18, 2003 (gmt 0)

I am still somehow not seeing a URL where to read anything on the crawler purpose. The deny on Microsoft will remain.


 8:54 am on Apr 18, 2003 (gmt 0) ... "MicrosoftPrototypeCrawler (please report obnoxious behavior to newbiecrawler@hotmail.com)"

You can go into the Windows system registry and change that to what ever you want.

I think that both MSN and hotmail are both owned by MS?


 6:20 pm on Apr 18, 2003 (gmt 0)

Seeing the new UA now, too.

I sent an e-mail.... let's see what I get back!



 7:58 pm on Apr 18, 2003 (gmt 0)

What do ya know! They responded! And pretty quickly, too. I am going to point them to this tread, and hopefully they will comment on it here!



 8:10 pm on Apr 18, 2003 (gmt 0)

I silently removed their name from my newsletter

Well, so much for that! :-)


 10:26 pm on Apr 18, 2003 (gmt 0)

Does anyone besides me consider how absurd it is that a "legitimate" Microsoft project would use a Hotmail address for feedback? I don't know, but that just seems not quite right.


 10:32 pm on Apr 18, 2003 (gmt 0)


>>>> I silently removed their name from my newsletter

Whoops. Sorry. :(



 11:49 pm on Apr 18, 2003 (gmt 0)

Hey, carfac?

His name is Keith Birney and I sent him the url of this thread last evening explaining how he might like to be here before this discussion gets too far afield. You know, damage control.

While I'm here tonight and thinking about it: If Keith does not come to the boards, that does not neccessarily reflect upon that legitimacy of this new bot. Rather, it may only mean that he is busy answering all the others who've communicated with him.

I suggested he slow it down a bit.

He did mention: "(It found your site less than three minutes into the crawl.)"



 3:54 am on Apr 19, 2003 (gmt 0)


That is he- and he seemked very nice in the e-mail. But I still have no clue WHAT it is he is doing!

>>> It found your site less than three minutes into the crawl

My site?



 3:59 am on Apr 19, 2003 (gmt 0)


That is he- and he seemked very nice in the e-mail. But I still have no clue WHAT it is he is doing!

>>> It found your site less than three minutes into the crawl

My site?



 10:03 am on Apr 19, 2003 (gmt 0)

I suggested he slow it down a bit.
He did mention: "(It found your site less than three minutes into the crawl.)"

I didn't suggest he slow it down out of sympathy for this thread, that's for sure. LOL.

No, Dave. That was about my site. ;)



 7:02 pm on Apr 24, 2003 (gmt 0)

Did anyone ever get an answer as to who this is?

I am still getting hit at a ridulous rate and would like to know if I should ban it.

This 111 message thread spans 4 pages: 111 ( [1] 2 3 4 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved