That Bot has Consumed 1.45 Gig in 16 days

Forum Moderators: mack

Message Too Old, No Replies

That Bot has Consumed 1.45 Gig in 16 days

Harry

7:33 pm on May 16, 2004 (gmt 0)

This bot is abusive. If Microsoft want to catch up with others, it shouldn't do it on the back of Webmasters who will have to pay up because this sucker clogs their site and drive up their allowed bandwidth.

I'm already near my months' limit just because of this bot. Nothing even comes close to MSN bot interms of abuse.

Why does Microsoft always make people hate them?

Harry

7:39 pm on May 16, 2004 (gmt 0)

Ok, last check with todays results in. MSN has a total of 2.28 GB

pmac

7:46 pm on May 16, 2004 (gmt 0)

From [search.msn.com...]

In general MSNBot should not try to access your site more than once every few seconds. MSNBot will also account for the time it takes to download a page from a site so tht if your site has a slower connection we will not access it as frequently. If you find that we are placing too high a load on your site please let us know by sending us e-mail at msnbot@microsoft.com.

Receptional

5:05 pm on May 17, 2004 (gmt 0)

Rather than ban it, why not restrict it to (say) the home page or a few choice pages, so as to stop it from missing your site altogether.

I would worry about trying to stop MSN - you are going to want their traffic when Ink stops sending it, and even though Microsoft forever make enemies, they always end up winners...

Word v Wordperfect, IE v Netscape, MSN v Google...?

IITian

2:56 am on May 19, 2004 (gmt 0)

>This bot is abusive. If Microsoft want to catch up with others, it shouldn't do it on the back of Webmasters who will have to pay up because this sucker clogs their site and drive up their allowed bandwidth.

What do you mean by abusive? Is it spidering the same pages multiple time? During the last two weeks it spidered my site completely once and is now spidering a few stray pages here and there. Total bandwidth used should be below two times my disk usage - hardly of any bandwidth concern.

Harry

10:47 am on May 19, 2004 (gmt 0)

The site that was hit has less than 140 mb worth of data, half of which is not contents accessible to users, such as scripts and databases.

If that not abusive, then what is?

IITian

5:02 pm on May 22, 2004 (gmt 0)

I agree that it is a lot. On my site it has been very decent, fairly active now but not to the point of abuse. Hope it does not turn up to be like the wisenutbot, crawling and crawling endlessly.

sidyadav

10:13 am on May 23, 2004 (gmt 0)

[webmasterworld.com...] :)

I'd say ban it for now, then remove the ban once it really launches.

Sid

Staffa

7:43 pm on Jun 5, 2004 (gmt 0)

Thanks for the old post sidyadav I was just planning to ban this bot altogether and I certainly will now. It must know my site by heart by now and certainly better than I do myself.

BTW the bot does follow robots.txt. A while ago I already banned all sub dirs and it has stayed away from these.

gf4eva

12:15 pm on Jun 8, 2004 (gmt 0)

It's banned! on my .htaccess

<Limit GET PUT POST>
order deny,allow
deny from 65.54.164.
</Limit>

I've tested it and it worked but if anybody's got suggestion - much appreciated.

The farking thing hogged pver 4Gb of my bandwidth! with the server limit of 7Gb phew - nothing left for my customers or for the undisputed Google!

isitreal

4:02 pm on Jun 9, 2004 (gmt 0)

the bot definitely needs some work, many of my sites have a page just to detect search engines, sends emails each time one hits, most other bots I get one or two emails then nothing, but msnbot keeps requesting the same page on the same sites over and over, I just put up a new site, with only 3 pages, in the last 7 days msnbot has made 14 page requests.

But on the brighter side, on a 1200 page site, msnbot is the first bot in a long time to have gone through the entire site, and is now starting to go over it again, looks like a live tests are getting very close, sympathy for those getting bandwidth problems, but I'm very happy to see another search engine being born, even if it's from MS, remember how they do when they are actually competing? IE 4-5 were by far and away the best browsers in the world when they were released, it's possible that ms may go for best search engine, easy pickings if google doesn't get their @#@# together fast, I miss having bookmark quality results, and I don't like having directories in number 1 position more often than not.

dhatz

5:19 pm on Jun 9, 2004 (gmt 0)

msnbot has requested a site of mine 5 times over the last 7 days. 20.000 pageviews from msnbot for a 4000 page site.

Although the content / timestamps didn't change during that period for 99% of the pages, it downloaded the whole lot again and again and again, with 200 rather than 304 http codes.

isitreal

6:54 pm on Jun 9, 2004 (gmt 0)

I just realized, all my sites are dynamic, and I don't return last modified headers since it's too much of a pain in the @#@ to set that up, and doesn't do me any particular good, despite googleguy asking for that, but that's for their sake, not mine, so unless search engines pay me to do it, I won't.

But that might be the source of the problem, the spider is assuming maybe that no last modified date means spider it again,? just a guess, but it would make sense, that seems like the kind of bug to expect on a first working model, and not much they can do while they are building the index I suspect to get around that issue.

If your pages are being run through the php/asp type engine they will have this behavior whether or not they are actually scripted or dynamic, for example if you make all htm/html pages php parsed like I do it doesn't matter if they are actually dynamic or not, they don't return a last modified by header by default