Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: mack
I'm already near my months' limit just because of this bot. Nothing even comes close to MSN bot interms of abuse.
Why does Microsoft always make people hate them?
In general MSNBot should not try to access your site more than once every few seconds. MSNBot will also account for the time it takes to download a page from a site so tht if your site has a slower connection we will not access it as frequently. If you find that we are placing too high a load on your site please let us know by sending us e-mail at firstname.lastname@example.org.
joined:Mar 8, 2002
I would worry about trying to stop MSN - you are going to want their traffic when Ink stops sending it, and even though Microsoft forever make enemies, they always end up winners...
Word v Wordperfect, IE v Netscape, MSN v Google...?
What do you mean by abusive? Is it spidering the same pages multiple time? During the last two weeks it spidered my site completely once and is now spidering a few stray pages here and there. Total bandwidth used should be below two times my disk usage - hardly of any bandwidth concern.
BTW the bot does follow robots.txt. A while ago I already banned all sub dirs and it has stayed away from these.
<Limit GET PUT POST>
deny from 65.54.164.
I've tested it and it worked but if anybody's got suggestion - much appreciated.
The farking thing hogged pver 4Gb of my bandwidth! with the server limit of 7Gb phew - nothing left for my customers or for the undisputed Google!
But on the brighter side, on a 1200 page site, msnbot is the first bot in a long time to have gone through the entire site, and is now starting to go over it again, looks like a live tests are getting very close, sympathy for those getting bandwidth problems, but I'm very happy to see another search engine being born, even if it's from MS, remember how they do when they are actually competing? IE 4-5 were by far and away the best browsers in the world when they were released, it's possible that ms may go for best search engine, easy pickings if google doesn't get their @#@# together fast, I miss having bookmark quality results, and I don't like having directories in number 1 position more often than not.
Although the content / timestamps didn't change during that period for 99% of the pages, it downloaded the whole lot again and again and again, with 200 rather than 304 http codes.
But that might be the source of the problem, the spider is assuming maybe that no last modified date means spider it again,? just a guess, but it would make sense, that seems like the kind of bug to expect on a first working model, and not much they can do while they are building the index I suspect to get around that issue.
If your pages are being run through the php/asp type engine they will have this behavior whether or not they are actually scripted or dynamic, for example if you make all htm/html pages php parsed like I do it doesn't matter if they are actually dynamic or not, they don't return a last modified by header by default