Forum Moderators: mack
MSN delivers some fine targeted traffic for some sites, and they sure do keep their index fresh. NICE job they're doing for being the new kids on the block, imho.
Page 1 - 1-10 of 148 results.
I then move to page 2.
Page 2 - 11-20 of 1034 results.
Move on to page 3
Page 3 - 21-30 of 779 results.
This just seems weird to me. Does anyone else see this or have an explanation as to why it might be happening.
Ska
his just seems weird to me. Does anyone else see this or have an explanation as to why it might be happening.
Ska -
I have worked on some very large-scale search engines in the past, and we had the same 'issue'. In our case it was caused by a tiered architecture and an estimation algorithm that was used to estimate total count when enough results where returned from the primary tier (which included the 'best' content). Only in cases where more results where needed than could be served by the first tier would results actually be served from lower tiers. At this point, our total count would change since we would have actual data about 2 tiers and only have to estimate count for N-1 tiers. The net effect was that the total count would converge towards the actual number of results as you paged into the results set.
Not sure that is the exact cause here, but I am pretty sure it would be related some sort of estimation that improves as you drill into the results set. Search engines are always looking for ways to make the searches more efficient.
-Bb
In our case it was caused by a tiered architecture and an estimation algorithm that was used to estimate total count when enough results where returned from the primary tier (which included the 'best' content).
I'm asking because when I did a site:example.com search early this morning, for a site that has about 20-25 or so pages, the first pages to be returned in the results were the index pages of the /directories/ in the site with other pages following. I also saw the bot grabbing just those page in one pass within the day or so prior to this current change. I've yet to see that happen at any engine, though it did seem vaguely suspect that Google was crawling by directory structure at one time.
Also, I did see for that site at the same time n number of pages returned out of 90 - and there's nowhere near 90 pages, MSN currently has just under 20. So was that some kind of guesstimate based on a tiered directory structure?
Putting that together with how they cluster and seem to analyze linking patterns between site pages, I'm wondering how much actual site architecture and directory structure comes into play.
Added:
Now it's saying 10 pages out of 20 (not 90), so it's right this time - but this is one of the reasons MSN does need watching during update periods, if only to catch little things like this when they happen. And they are still showing the /directory/ index pages first - very interesting.
In our case it was caused by a tiered architecture and an estimation algorithm that was used to estimate total count when enough results where returned from the primary tier (which included the 'best' content)
I should have been more clear. The tiered architecture was on the search engine side. Documents were placed into different tiers based on their perceived importance (authority). If suffecient results where returned from the first tier - which was the case for most standard queries (two or three term query and 10 results requested) - the lower tiers were never quered since that would have been inefficient (slower and more expensive) and an estimation was done to get total count.
With the 'site:' examples here, you could get esimated counts if (1) the search engine is using similar architecture or estimation logic to improve performance and (2) all the results from the given site are not in the same tier. In most cases, not all pages on a site are of the same value, so it is likley the content could be spread across multiple tiers.