| 10:12 am on May 25, 2004 (gmt 0)|
Same here, I've got plenty of spare data transfer allowance so I'm letting it do its stuff. But it would have been banned a while ago if I was close to my limit and I still didn't know if the data was going to actually get used publicly.
On one new site, that was only really finished a couple of weeks ago, the stats are: -
Users transfer: 182.38Mb
Other bots : 162.97Mb
MSNbot transfer : 1783.70Mb
It looks like its pulled the whole of that site twice now, once by msnbot64057 and once by msnbot64058. Other sites its hitting a bit less hard. Hopefully they won't keep it up for too long before launching.
| 1:20 pm on May 30, 2004 (gmt 0)|
9.4 gb msnbot took from us this month.
| 3:58 pm on May 30, 2004 (gmt 0)|
MSNbot was crawling so fast from the same ips that my site automatically banned it (more than 25 page views per 60 seconds triggers a warning notice, going to 30/60 triggers an ip ban).
So I wrote to MSN and they said they'd fix it. I haven't had them banned since, so it looks like they're quite receptive to resolving issues.
| 6:06 pm on May 30, 2004 (gmt 0)|
Jason, off topic, but how are you doing that ban ip thing?
| 6:25 pm on Jun 8, 2004 (gmt 0)|
16.6GB from us, last month, before I banned it in robots.txt:
(Since "msnbot" is in the user agent string, this keeps the crawler out.)
I figure they have enough from me.
BTW, my theory is that MSN will push its speedy delivery of cached multimedia files to searchers as its edge over Yahoo/Google when the engine finally rolls out. They are probably collecting and categorizing/keywording as much of that kind of content as they can. We only have 3 multimedia files on our server, but if you hit it 500 times per day, it adds up to a lot of bandwidth.
| 3:50 pm on Jun 9, 2004 (gmt 0)|
I'm welcoming it, I can't wait for google to stop being 50% of the market, having to conform my HTML to one primary search provider that can't even supply consistent results over a 6 month period is one of the less productive things I've ever been involved in re web work, MSN is obviously ramping up seriously now, I want yahoo and msn get up to speed as quickly as possible, by the extreme activity we're seeing by msnbot I'd say they are getting pretty close to completing their index, I don't know how long it will take to process it and debug stuff, but it could be pretty fast, they are obviously getting ready for a realtime live test of the entire system or they wouldn't be grabbing the web so aggressively, and I most definitely want all my sites to be in those results as soon as they go live, keep in mind all IE browsers default to msn search, so when they deliver decent results google is going down fast in market share.
If MSN acts fast, they can grab significant market share from google by simply providing bookmark quality results like google used to do (remember when you could more or less count on getting consistent results from google, I didn't even use bookmarks for almost a year, now I always bookmark). It's not like google has really improved their quality in the last tweaks as far as I can see.
| 4:25 pm on Jun 9, 2004 (gmt 0)|
Some of you said that they received replies from MSN. I'm still waiting. And until then they will stay banned. I'm all for competition, but not when I have to pay extra every month to feed a fat greedy robot.
| 8:09 pm on Jun 9, 2004 (gmt 0)|
I e-mailed them and still have had no reply. At the moment I'm seeing bots from both search.msn.com and from an IP in Microsoft Research's range (204.9*.9*.***) . The site being hit is the main Irish web directory of all .ie websites and domains and these Microsoft muppets are really beginning irritate me with their toy search engine antics. Microsoft search does not send any serious traffic on any given month so it is a simple bandwidth versus returns equation and Microsoft is in debt.
| 11:31 pm on Jun 9, 2004 (gmt 0)|
Some folks have asked whether they should let MSNBot crawl their website? Is it worth it? Well, we are probably biased but we think it is definitely worthwhile :) This data will be used as part of an algorithmic search engine that is being built by Microsoft. Our intention is to allow folks to search on this engine by the end of the year.
With regards to aggressiveness of the crawl: we are definitely learning and improving. We take politeness very seriously and we work hard to make sure that we are fixing issues as they come up. For specific politeness issues we recommend you E-mail us at email@example.com. Either myself or someone from the team will take the time to investigate what is happening and send a reply. We will be posting to this forum to answer general questions about politeness, best practices, and the search engine itself. Look forward to chatting with everyone here.
-msn search dudes (msd)
| 11:42 pm on Jun 9, 2004 (gmt 0)|
Nice to see someone from Microsoft finally taking notice. Any chance of a reply to my e-mail?
| 12:01 am on Jun 10, 2004 (gmt 0)|
Welcome to webmasterworld msndude.
While the folks with lots of low traffic pages, like forums, won't like the aggressive crawling, I sure love it. A complete database is an absolute necessity for MSN to compete with Google, so I hope we will see this sort of aggressive free crawling forever.
| 12:11 am on Jun 10, 2004 (gmt 0)|
msndude, it struck me that the bot may not be properly handling dynamic pages, all my pages are dynamic, and have no last modified date in the header, I suspect most people do not have last modified dates, is it possible that the bot is simply set to grab the pages if either no last modified header or if it hasn't been spidered since last update?
I just put up a three page website 4 days ago and have had 14 msnbot page requests in that time.
since there is no way I'm going to reprogram all my stuff to do proper last modified headers, way too much work for zero payoff on my end, is this something that can be fixed on msn's end?
I definitely welcome the aggressive crawl, although if msn is able to crush google we'll be singing a different song soon enough....
| 12:18 am on Jun 10, 2004 (gmt 0)|
Send the bot my way, I have plenty of bandwidth :)
If MSN's search DOES end up crushing Google I'll be glad I was indexed. And even if it doesn't, there will certainly be enough hype about it in the beginning that it will be worth while.
SEO's are going to have a field day making sites index high in both Google AND MSN though.
| 1:53 am on Jun 10, 2004 (gmt 0)|
Msnbot sure knows how to crawl. That's one butt-kickin' spider. Let the games begin!
| 2:07 am on Jun 10, 2004 (gmt 0)|
I'd far prefer to provide Microsoft with some kind of feed that is less bandwidth intensive. One of the reasons for this is that my main site is approximately 80K of static pages. A new site that I am working on has over 1.5 Million static pages of localised UK domain and website information. Having such an aggressive bot as msnbot hitting the bigger site would destroy my bandwidth. The current msnbots are hitting at the rate of approximately 1 page every three seconds. This kind of search partner feed would probably be very attractive for the webmasters of large sites as it would save them on bandwidth charges each month.
It would be a good thing if Microsoft would work with webmasters if it wants to make a superior search engine. However most search engines are inherently algorithmic so that claim does not impress people here. :) (Though I still think that the emphasis on personalisation is wrong. The more important thing for most people I talk to about searching is the localisation aspect and the way Google et al are going about it is wrong - people do not search by postcodes.)
| 2:18 am on Jun 10, 2004 (gmt 0)|
MSNDUDE, I sent two messages over three weeks ago and I'm still waiting for an answer from your office. I'm a sceptic, so I'll believe you're committed when someone from your team replies to my concerns.
Until then I consider your intervention PR and damage control and that's not what I require or expect from you.
I don't apologize for being rude because I had to pay for the privilege of feeding your experiments last month.
| 2:28 am on Jun 10, 2004 (gmt 0)|
|Until then I consider your intervention PR and damage control |
that's what I consider all these search engine guys who post here, why would you consider it anything else? I'm going to have to go to the store to buy more salt since you have to take everything the employee of a company says about said company's product with large grains of salt, I think I'll go with rock salt this time.
Especially MS, who don't have a very good track record in the PR area. But I still welcome a new search engine, I'd rather deal with 3 fairly equal ones than 1 main one, which is what I have to do now, google is getting boring with their endless tweaks [ hint: msn, make a reliable, stable product, with reliable, stable results, don't do huge algo tweaks if at all possible... nobody cares about search, they just care about getting reasonably decent results ].
| 5:03 am on Jun 10, 2004 (gmt 0)|
|I don't apologize for being rude because I had to pay for the privilege of feeding your experiments last month |
Send them the bill :P
| 5:25 am on Jun 10, 2004 (gmt 0)|
"I had to pay for the privilege of feeding your experiments last month."
"I can be rude because I'm too lazy to add two lines to robots.txt"
| 8:23 am on Jun 10, 2004 (gmt 0)|
Remember guys, TOS [webmasterworld.com] 4 and 19 applies to ALL of our members, regardless of who employs them. Let's get back on topic!
Fortunately, bandwidth is cheap for my US based sites, so I just let MSNbot get her fill. Maybe that saved some of you a few bytes. ;)
| 9:34 am on Jun 10, 2004 (gmt 0)|
Welcome to Webmasterworld msndude! :)
It is good to see that another search engine rep has joined the ranks.
| 11:45 am on Jun 10, 2004 (gmt 0)|
Steve, remember that everybody here doesn't have the same level of knowledge of webmastering. Before that episode, I had heard but was completely unfamiliar with robots.txt. So by the time I was aware of the problem, the damage was done. The robot consumed my bandwidth in one weekend.
There is a robots.txt now. So don't assume that I was lazy. Don't make up scenarios when you don't know the full extent of the story.
As for the "rude" part, had I not added that word, you guys wouldn't have perceived my message as rude, just very direct. It's all in the packaging.
However, calling someone "lazy" is rude and disrespectful.
| 3:32 pm on Jun 10, 2004 (gmt 0)|
|Remember guys, TOS 4 and 19 applies to ALL of our members, regardless of who employs them. |
What about items 18 and 20? The search company representatives are not standard posters, they occupy a special position on these forums as far as I can tell, and represent clear commercial interests, but do provide a certain insight into those commercial entities, as long as you bring your salt with you.
| 4:26 pm on Jun 10, 2004 (gmt 0)|
Welcome to Webmasterworld msndude!
| 5:03 pm on Jun 10, 2004 (gmt 0)|
Welcome to Webmasterworld msndude!
Bring It On! Most of us will be looking forward to your comments & suggestions as the new MSN Search Rolls out!
| 5:04 pm on Jun 10, 2004 (gmt 0)|
Just a thought on this subject, I wonder if MSN is "testing" their search engine with live web pages instead of cached data.
It would be slow as crap, but I can't think of any other reason why it would do what it's doing. Think about this... they have a cached snapshot of your page which is used for text queries just like Google but maybe they're going a step further and actually comparing the results to the "live" page to make sure that it hasn't changed since they cached it, and that it's not a dead link.
I don't think this model would be very efficient in a production environment - in fact I KNOW it wouldn't - but I can see some advantages for them to run this way while testing.
| 5:15 pm on Jun 10, 2004 (gmt 0)|
So its MSN's fault you don't know how to do your job?
| 5:26 pm on Jun 10, 2004 (gmt 0)|
If you can't handle this bot, maybe it's time to get on a new server. (Unless of course your site is MICKEY MOUSE!)
Because all of you people blocking it are going to be the same one's starting, "MSN dumped me for no resason!" threads.
| 5:29 pm on Jun 10, 2004 (gmt 0)|
Welcome to WebmasterWorld, msndude.
Good to have you here!
| This 56 message thread spans 2 pages: 56 (  2 ) > > |