|Disappointed by the new msn search number of pages indexed|
At first very enthusiastic about the new msn search...
At first very enthusiastic about the new msn search, Iím now a bit tempered.
The MSN beta search contains approximately 40% of the pages from my site, where google contains 80 / 90 %.
I checked my main competitors, and the same ratio applies more or less. What about you?
Although the new msn is doing a great job adding fresh content, they canít expect to breakout if they donít catch up google on number of pages indexed per site.
Having the same problem with a couple of my sites.
once in a while i have to send feedback to msn which they act apon within 48hrs.
weird thing is most of my sites are fine, i'm trying to find out what it dosen't like about the these 2 as the link structure is the same as the sites it loves.
Non-indexing is certainly not a problem that I seem to have. Indeed, I think MSNbot has been "more than thorough" and is having no trouble at all going through my sites. If anything, the issue is that when my content changes, they leave cached files in their database. I have a 1000 page site which changes weekly with 4007 pages indexed by MSN!? with no confusing session variables or dynamic URLs, I am going to have to work out what happened there.
Presumably that will sort itself out in time.
Do you have highly dynamic content that may be hard for MSNbot to interpret (or session variables?)
Maybe it just takes time. When beta was announced, they had less than 1% of our pages...it's been going up nearly every day, though, and is now over 10% (of 60,000 pages).
For me, they have been crawling thounsands of pages a day for months but they still don't manage to put even close to 1 percent of the inventory in the index. Not sure why.
Google has 200 times the number of my pages as MSN.
|they canít expect to breakout if they donít catch up google on number of pages indexed per site. |
That's a myth perpetuated by webmasters who spend far to much time doing site: searches. Matching the total number of pages indexed on a given site has nothing to do with whether or not MSN will produce an engine of equal quality.
The reality is that regardless of how big your site is, the majority of your search engine traffic comes from a relatively small percentage of your total pages. All MSN has to do is make sure the pages they do keep include the ones that actually show up in SERPS. All the rest are simply taking up space.
|The reality is that regardless of how big your site is, the majority of your search engine traffic comes from a relatively small percentage of your total pages. All MSN has to do is make sure the pages they do keep include the ones that actually show up in SERPS. All the rest are simply taking up space. |
I don't like this argument.
Although some of my pages don't have high ranks and have low popularity, they nevertheless contains unique keywords that will certainly interest a minority of users that should not be trashed away.
This is precisely why I liked to use google: it would help me find THE precise keywords that I was looking for, hidden on a website lost in the dark.
This is what differentiate a search engine from a directory.
Indexing all the pages take up more space? Well it's not my problem. It is theirs. In my opinion, the best search engine is the one that will provide the most complete index and the most relevant results.
So if only the highest serp pages should be included, then lets have MSN index only the 20% ďimportantĒ pages of our site and see how it will compete with google.
Well that's a pretty backwards looking philosophy. Deep, quality domains are can draw traffic from thousands of queries to hundreds or even thousands of pages a day. Sure these interchangeable 100,000 page shopping cart sites have zero need to be fully indexed, but a site offering authoritative biographies of every baseball player who ever played for example should be fully indexed if you are a search engine.
MSN has significant problems ranking deep pages from authority sites, while it favors front pages from useless network garbage. Search engines should be seeking to dig deeply into quality domains, and if they ignore anything, ignore the endless ten page template bits of nothing that just link to affiliate parents.
The whole issue should be addressed from a different perspective, which is MSN's great failing... quality content. If a search engine recognizes a site, not a page, as being a quality resource, it should dig deeply. If there are no signs of quality, it shouldn't make as much effort to index.
MSN crawls a lot but can't (yet) manage to index the first 1000 pages of basically anything. Google on the other hand has 100% success at this regularly. At the same time, MSN has a LOT of content indexed that Google (or Yahoo) don't. Unfortunately a lot of that is session id type of stuff.
MSNbot is pretty good, but still this problem of indexing networks of piffle while not getting 50% of 1000 page authority sites is an unfortunate circumstance that MSN should work to correct.
Gotta agree with steveb and not Webguerilla on this one. I run a large deep content site, like the baseball player example except on a completely different subject. Just because MSN properly indexes Babe Ruth does not mean it can safely skip Joe DiMaggio.
|MSN crawls a lot but can't (yet) manage to index the first 1000 pages of basically anything. |
For one of my sites, I have around 1,800 pages indexed in MSN beta, but it's still barely scratching at the surface (and is a tiny proportion of the pages crawled). Google has a little over 300,000 pages from the same site indexed, and is sending traffic to thousands of different pages on the site each day (because they contain unique search terms).
Another older site of mine has 2,600 pages in MSN beta, but over 60,000 in Google.
So I agree: the missing pages result in many missed search terms, and therefore a less useful index.
If I were conducting a beta test, I would not include all of the pages I had in my primary index, I would have the public see a subset. You don't need to throw everything in to see how something works. Scaling is easy - just more space and CPUs.
For most of our sites, not even 1% has been indexed and the same seems to be the case for a majority of the sites.
It seems MSN is going to spring up a major surprise with all pages crawled when they launch their new engine.
I would also like to note that MSN's crawlers have crawled all our sites completely over and over a million times ;)
Come on Bill Gates, hurry up...
seems like number of pages is not correct,
on first page for my static site that has about 300 static pages,
says: 1-10 of 650 pages
next page says: 11-20 of 334 pages,
last page shows up as number 131-138 of 138 pages.
Google has 378 pages of mine,
It seems possible that MS is applying the traditional IE-relevance test.
That test, as used by many webmasters, goes: "less than n% of my visitors use browser X, so it is unecconomic to support it"
As applied to search engines, it would go: "less than n% of seaches would find this page, so it is unecconomic to include it".
Makes perfect sense both ways, surely?
maybe it's called:
for a reason :)
The Redmond peeps can afford to take all the time they think they need, and that's fine by me. It's their SE, so they need to be happy with it first.
Our experience is quite different from some of those mentioned here - our sites are now totally indexed and have been for a long time. MSNbot is very industrious visiting everyday and looking at a large percentage of pages everyday. This is on a variety of sites ranging from 35 pages to 20,000 pages. It is actually something we have been impressed with. Also, we see MSN making big strides in looking at and ranking deep pages all the time. Maybe that is just in our corner of the world, but that is what we see happening.