homepage Welcome to WebmasterWorld Guest from 23.20.77.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
That Bot has Consumed 1.45 Gig in 16 days
Harry




msg:1538839
 7:33 pm on May 16, 2004 (gmt 0)

This bot is abusive. If Microsoft want to catch up with others, it shouldn't do it on the back of Webmasters who will have to pay up because this sucker clogs their site and drive up their allowed bandwidth.

I'm already near my months' limit just because of this bot. Nothing even comes close to MSN bot interms of abuse.

Why does Microsoft always make people hate them?

 

Harry




msg:1538840
 7:39 pm on May 16, 2004 (gmt 0)

Ok, last check with todays results in. MSN has a total of 2.28 GB

pmac




msg:1538841
 7:46 pm on May 16, 2004 (gmt 0)

From [search.msn.com...]

In general MSNBot should not try to access your site more than once every few seconds. MSNBot will also account for the time it takes to download a page from a site so tht if your site has a slower connection we will not access it as frequently. If you find that we are placing too high a load on your site please let us know by sending us e-mail at msnbot@microsoft.com.

Receptional




msg:1538842
 5:05 pm on May 17, 2004 (gmt 0)

Rather than ban it, why not restrict it to (say) the home page or a few choice pages, so as to stop it from missing your site altogether.

I would worry about trying to stop MSN - you are going to want their traffic when Ink stops sending it, and even though Microsoft forever make enemies, they always end up winners...

Word v Wordperfect, IE v Netscape, MSN v Google...?

IITian




msg:1538843
 2:56 am on May 19, 2004 (gmt 0)

>This bot is abusive. If Microsoft want to catch up with others, it shouldn't do it on the back of Webmasters who will have to pay up because this sucker clogs their site and drive up their allowed bandwidth.

What do you mean by abusive? Is it spidering the same pages multiple time? During the last two weeks it spidered my site completely once and is now spidering a few stray pages here and there. Total bandwidth used should be below two times my disk usage - hardly of any bandwidth concern.

Harry




msg:1538844
 10:47 am on May 19, 2004 (gmt 0)

The site that was hit has less than 140 mb worth of data, half of which is not contents accessible to users, such as scripts and databases.

If that not abusive, then what is?

IITian




msg:1538845
 5:02 pm on May 22, 2004 (gmt 0)

I agree that it is a lot. On my site it has been very decent, fairly active now but not to the point of abuse. Hope it does not turn up to be like the wisenutbot, crawling and crawling endlessly.

sidyadav




msg:1538846
 10:13 am on May 23, 2004 (gmt 0)

[webmasterworld.com...] :)

I'd say ban it for now, then remove the ban once it really launches.

Sid

Staffa




msg:1538847
 7:43 pm on Jun 5, 2004 (gmt 0)

Thanks for the old post sidyadav I was just planning to ban this bot altogether and I certainly will now. It must know my site by heart by now and certainly better than I do myself.

BTW the bot does follow robots.txt. A while ago I already banned all sub dirs and it has stayed away from these.

gf4eva




msg:1538848
 12:15 pm on Jun 8, 2004 (gmt 0)

It's banned! on my .htaccess

<Limit GET PUT POST>
order deny,allow
deny from 65.54.164.
</Limit>

I've tested it and it worked but if anybody's got suggestion - much appreciated.

The farking thing hogged pver 4Gb of my bandwidth! with the server limit of 7Gb phew - nothing left for my customers or for the undisputed Google!

isitreal




msg:1538849
 4:02 pm on Jun 9, 2004 (gmt 0)

the bot definitely needs some work, many of my sites have a page just to detect search engines, sends emails each time one hits, most other bots I get one or two emails then nothing, but msnbot keeps requesting the same page on the same sites over and over, I just put up a new site, with only 3 pages, in the last 7 days msnbot has made 14 page requests.

But on the brighter side, on a 1200 page site, msnbot is the first bot in a long time to have gone through the entire site, and is now starting to go over it again, looks like a live tests are getting very close, sympathy for those getting bandwidth problems, but I'm very happy to see another search engine being born, even if it's from MS, remember how they do when they are actually competing? IE 4-5 were by far and away the best browsers in the world when they were released, it's possible that ms may go for best search engine, easy pickings if google doesn't get their @#@# together fast, I miss having bookmark quality results, and I don't like having directories in number 1 position more often than not.

dhatz




msg:1538850
 5:19 pm on Jun 9, 2004 (gmt 0)

msnbot has requested a site of mine 5 times over the last 7 days. 20.000 pageviews from msnbot for a 4000 page site.

Although the content / timestamps didn't change during that period for 99% of the pages, it downloaded the whole lot again and again and again, with 200 rather than 304 http codes.

isitreal




msg:1538851
 6:54 pm on Jun 9, 2004 (gmt 0)

I just realized, all my sites are dynamic, and I don't return last modified headers since it's too much of a pain in the @#@ to set that up, and doesn't do me any particular good, despite googleguy asking for that, but that's for their sake, not mine, so unless search engines pay me to do it, I won't.

But that might be the source of the problem, the spider is assuming maybe that no last modified date means spider it again,? just a guess, but it would make sense, that seems like the kind of bug to expect on a first working model, and not much they can do while they are building the index I suspect to get around that issue.

If your pages are being run through the php/asp type engine they will have this behavior whether or not they are actually scripted or dynamic, for example if you make all htm/html pages php parsed like I do it doesn't matter if they are actually dynamic or not, they don't return a last modified by header by default

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved