| 5:29 pm on Jun 10, 2004 (gmt 0)|
|Because all of you people blocking it are going to be the same one's starting, "MSN dumped me for no resason!" threads. |
Heh isn't that the truth. Funny how people realize they have the right to block whoever they want from their site, but when someone does it to THEM they get their panties in a twist about it.
| 5:37 pm on Jun 10, 2004 (gmt 0)|
I have to agree, blocking the first draft of a search engine that could very easily control 30% of the market within 12-18 months does not strike me as a very good idea.
Until somebody suggests something more coherent, my first suspicion is that the msn bot is requesting pages repeatedly that do not have a last modified by header, dynamic pages.
This would be simple to determine if everyone who is posting were to check this on their sites now, and might give the msndudes something concrete to work on re fixing the problem. I have something like 40+ sites, all dynamic, all being aggressively spidered, none return last modified headers. Unfortunately, I don't have any non dynamic sites any longer to do a counter check on, I doubt most people here do.
| 5:48 pm on Jun 10, 2004 (gmt 0)|
|Heh isn't that the truth. Funny how people realize they have the right to block whoever they want from their site, but when someone does it to THEM they get their panties in a twist about it. |
During the last week, MSNBOT has downloaded one of my sites 4 (FOUR) times. The site has almost 4000 pages (all unique content) and in my logs there are 16757 page hits by msnbot.
All pages are STATIC, 99% of those pages unmodified since 20-May and my server prints a "Last-Modified:" header, yet msnbot just downloads the whole site with HTTP 200 (rather than 304 as a well behaved Web app would).
Plus, my (perhaps biased, I admit it) opinion is that MS Search will index the Web for free (ie x billion pages listed in our index, where x close to Google's) BUT THE MAJORITY OF THE TRAFFIC WILL GO TO THEIR PPC CLIENTS, because the sponsored listings will come first.
I've not blocked msnbot, but I understand it can be a problem for some, especially if people think there is no benefit.
Same why reports suggest that Y owns 20% of the SE market, yet most of my sites receive 3-9% of their free referrals via Yahoo.
| 5:51 pm on Jun 10, 2004 (gmt 0)|
|Same why reports suggest that Y owns 20% of the SE market, yet most of my sites receive 3-9% of their free referrals via Yahoo. |
Does ANYONE receive more than 3-9% of their traffic from Yahoo? The only time I got a lot of traffic from Yahoo, even being in their paid directory and everything, was with my paid Overture ads. Even then it wasn't a huge percentage.
| 5:58 pm on Jun 10, 2004 (gmt 0)|
I get about 11% from yahoo.
| 6:01 pm on Jun 10, 2004 (gmt 0)|
Welcome to WebmasterWorld MSNDude! Nice to have some MS representation here. I'm sure you'll do a nice job.
| 6:04 pm on Jun 10, 2004 (gmt 0)|
I get 25% from Yahoo
| 6:13 pm on Jun 10, 2004 (gmt 0)|
side note though.. I just discovered this site a few monthes ago and started learning about the SEO and marketing side of things, now that I have started working on that I suspect my numbers will be different in a few monthes
| 6:41 pm on Jun 10, 2004 (gmt 0)|
|All pages are STATIC, 99% of those pages unmodified since 20-May and my server prints a "Last-Modified:" header, |
thanks, I knew I could confirm or deny that theory very quickly, that denies it, I was giving the msnbot the benefit of the doubt, guess the problem is more fundamental, this would be in keeping with MS's pattern of having version 1 of a new product be essentially useless, primary function just to provide a door into new markets, but I don't see them dropping the search ball, google is too easy of a target right now... of course if ms insists on running the search engine server farm on windows they are going to be a bit handicapped, hotmaiil was never the same since they forced it to run windows and not unix... : )
| 8:34 pm on Jun 10, 2004 (gmt 0)|
"So its MSN's fault you don't know how to do your job?"
Exactly. Whether it is laziness or non-comptence, complaining that it is someone else's fault that you aren't doing or don't know how to do your job is way beyond rude.
Ban the bot from specific directories if you want, but this aggressive crawling is just about the best news that webmasters could ask for from MSN at this point. Of course they could completely bungle the ranking of the data, but it sure would be a great thing to have every page that I want crawled to be crawled every day.
| 9:00 pm on Jun 10, 2004 (gmt 0)|
|Of course they could completely bungle the ranking of the data, but it sure would be a great thing to have every page that I want crawled to be crawled every day. |
I agree, msnbot is the first spider to fully crawl a site I do that has about 1200 pages in a long time. A while ago this thread [webmasterworld.com] caused a huge, and very interesting to follow, uproar in the google forums, but oddly enough google is stuck at
4,285,199,774 web pages
and has been for a while now, that time seems to be spent in getting rid of as many pages as possible from the index, cleaning out pages, not fully spidering sites using various oddly lame sounding excuses.
You'll recall the claim was made that there was this upper limit:
4,294,967,296 (2^32) URLs.
Since I had no real way to know if this claim had any validity, I thought it would be a good idea to just watch google for a while, page count hasn't gone up at all for a while now, but complaints of failing to fully spider sites has skyrocketed over that time. Obviously the fact the count stops several million pages under the upperlimit could be sheer coincidence, but that's a very strange coincidence.
If there is anything to this at all, which I don't have any real way of knowing, MSN's timing could be exceptionally good, this may actually be a weak moment for google.
| 1:21 am on Jun 11, 2004 (gmt 0)|
To all those who said I didn't do my job or that I'm incompetent, how much did Microsoft paid you?
A badly programmed robot comes during a weekend and consummes 80% of may monthly bandwidth and I'm incompetent because I didn't see it coming?
I'm also incompetent for choosing a host that based on previous years traffic was very suited to the regular activities related to the site?
On top of that, by the time the robot.txt was inplace, I was already well over my monthly allocation. So, again, I'm an idiot for blocking this bot, when I already had to pay extra just so my site wasn't shut down. And in good jest, I'm supposed to let this thing hug my resources indefinitely and continue to pay up, just for a vague hope of making it in some serps to be released God knows when?
Again, how much are you getting paid?
| 1:56 am on Jun 11, 2004 (gmt 0)|
|for choosing a host that based on previous years traffic was very suited to the regular activities related to the site |
it's not a bad idea to have a host that can support the unexpected, like your site suddenly getting really popular, or getting aggressively spidered, I checked my bandwidth useage, the msnbot definitely jumped it way up, but it never got close to the bandwidth I'm allowed per day. My hosting company doesn't cost that much, but this kind of thing is why I picked them too of course, and is why they are top rated.
I try to get more traffic to my sites, so I'm happy to find one more way happening for more traffic to find me. But since you seem happy with your current status, and the steady traffic you're getting, why not just ban msnbot with robots.txt, it's not that big a deal.
| 2:14 am on Jun 11, 2004 (gmt 0)|
Hi. It's already banned. I'm also switching host, but I'm can't do it on a whim.
I'm taking this opportunity to transform and improve the very foundation of the site. There's a feason the MSNbot liked the site. It's full of original contents.
Some people commented that those who complain about the robot will probably complain about being listed later.
Again, I find all these scenarios that people make up very insulting. They know zilch and they start all these rumours.
For the record, I have never complained about being unlisted. That's the last of my problems. My site contains original contents and is updated daily. In normal months, 40% of my traffic comes from robots.
If MSN blocks me, there's not much I do about it and there's no point complaining. It's not like they are the only game in town.
| 4:37 am on Jun 11, 2004 (gmt 0)|
Before I banned it, MSNbot consumed 1.4GB in a week. I've blocked it as for now, will let it in once again (if) when it launches - can't care to risk it :)
Welcome to WebmasterWorld, msndude!
| 1:14 pm on Jun 11, 2004 (gmt 0)|
I have to ask you all to re-read the Terms of service section 4 [webmasterworld.com] again please and keep the conversation civilised and on topic.
I would also like to welcome MSNDude even though I don't work for them, I am not using Internet Explorer and nobody is paying me.
(They still own us all though :( )
| 2:36 pm on Jun 11, 2004 (gmt 0)|
Welcome to the board MSN Dude'ster. I think we all know what you can and can't cover here and it is nice to have msn onboard. (I hope your skin is as thick as your spider is aggressive. 'course, you work for MS, you'd have to have thick skin by now. lol!)
Everyone else, please leave the specifics at home or take it up with MS directly (just as we are not the Yahoo, Google, or Ask Jeeves conduit, we can't be the MS conduit either).
Anyway, back on topic: I have had MSNBot blocked here for the time being. Until I am confident that MSN has an actual proprietary search engine, I am going to leave it there. We've been hearing about Microsofts new search engine since MS techs first talked about it during the hype up to the release of Windows 98 (that was 1997 folks).
So, I think I will wait as see until we see something besides a spider ;-)
| 3:19 pm on Jun 11, 2004 (gmt 0)|
Like everyone else I've been seeing a lot of the msnbot recently. As I have several large sites, and comparatively little traffic, it's been my best "customer" for some months now. (A close second is Ask Jeeves/Teoma, followed by googlebot and in a poor fourth position Slurp). As I have lots of free bandwidth it's not a problem - yet. However: If and when the site grows in popularity, and bandwidth becomes an issue, and there's still no sign of a proprietary MSN search, there's a spot in my robots.txt reserved for msnbot ;-).
| 4:03 pm on Jun 11, 2004 (gmt 0)|
Hey, welcome msndude... Looking forward to some interesting discussion in the coming months. (I haven't slammed the door on msnbot yet, btw, though I'll be watching its eating habits.)
| 5:05 pm on Jun 11, 2004 (gmt 0)|
|Does ANYONE receive more than 3-9% of their traffic from Yahoo? |
I'm getting 30% from Yahoo.
| 5:20 pm on Jun 11, 2004 (gmt 0)|
Welcome msndude, feel at home and give a try to Linux baby!
| 6:32 pm on Jun 11, 2004 (gmt 0)|
Seriously.. bandwidth is cheap so if your on a host thats stingy with it thats your fault.
The bot is a good thing.
Just like I HATE Microsofts monopoly on the desktop I hate Googles in the search world. I sincerly hope M$ makes inroads in this area and gets knocked on their butt with their OS
| 8:18 pm on Jun 11, 2004 (gmt 0)|
well shoot, I thought someone at MS just liked me when I saw their new bot pounding my site, second only to G this month and Yohoo nowhere to be seen. I just hope their new SE starts feeding results to MSN soon so all this bandwidth is not wasted, (hint hint)
A timeline maybe?
| 10:09 pm on Jun 11, 2004 (gmt 0)|
There's some case for a meta tag specific to the MSNBot, like the revisit-after tag, except the other way around: Do not visit before. I don't mind Google or Yahoo spidering daily, but I don't want to see MSNBot more than once every three or four days, at least until it's serving a live search engine. Every site varies as to what it can handle, and this would be an excellent way for webmasters to set limits.
Around 40% of my traffic is bots, and that hasn't varied a great deal with the growth of overall traffic. Most of that is Googlebot. Add another major spider to the equation, and we're talking an extra 30% or so in bandwidth with no extra visitors to support the cost. Now I'm doing okay, but if those kinds of percentages are at all common, there are going to be a lot of people taking issue with it. So how about those meta tags, MSNDude?
| 11:23 pm on Jun 11, 2004 (gmt 0)|
|You'll recall the claim was made that there was this upper limit: |
4,294,967,296 (2^32) URLs.
ms will likely not have this problem, as they are probably built on top of the 64 bit version of windows server. and sql server, if this is the data engine can handle 64 bits without problem.
at least msnbot will follow a new internal link if it sees it, while google studiously ignores new internal links while concentrating on respidering things like contacts.htm several times a day.
no room in the "new links" database?
| 1:16 am on Jun 12, 2004 (gmt 0)|
welcome MSNdue - a relief to see you here. And of course, we love the fact that you are crawling deeply all of our sites. Wish you the best in providing healthy competition to Google.
| This 56 message thread spans 2 pages: < < 56 ( 1  ) |