tedster - 12:10 am on Apr 6, 2007 (gmt 0)

Agreed that any multiplier or divider would need to be fractional/decimal and not negative. Let's over-simplify and say that preliminary ranking is determined by some relatively basic combination of scores for on-page factors, backlink influence, history and pagerank.

Original Score=OP+BL+H+PR

...and a kind of SERP could be generated by sorting all the original urls according to these total scores. however, that step isn;t necessary if a re-ranking is going to be applied.

So next, re-rank by measuring various factors over just the preliminary set of urls rather than nusing the entire web -- that saves lots of computing cycles because the more intense "dials" are only applied to a very small sub-set of urls, rather than the entire web. By these tests, generate multipliers (m1, m2, m3, m4) for each component of the original score.

For instance, backlink influence could be modifed by what percent of those backlinks come from within the original set (a LocalRank calculation.) Or on-page scoring might be modified by discovering what percent of related words and phrases occur (phrase-based re-ranking). And so on.

So then you get this, very roughly:

Re-ranking Score=(m1*OP)+(m2*BL)+(m3*H)+(m4*PR)

...and now the final SERP is generated by sorting all the original urls according to these total scores.

The key here is that only the original set of results gets re-ranked. If you make it into the top 890 or whatever size the preliminary set is, then you will not drop out completely. But if you cross the threshhold for one of the second step tests, then you could fall dramatically within that set.

Now my pseudo-equations above are very, very rough - grossly simplified to illustrate the kind of math that I think I see shining through the current fog. It is the kind of math that more than a few Information Retrieval patents point to. For example:

 [0223] If the document is included in the SPAM_TABLE, then the document's relevance score is down weighted by predetermined factor... [0224] The search result set is then resorted by relevance score and provided back to the client. Google's recent 'Detecting Spam Documents' patent [webmasterworld.com]

So I see Google wresting with many issues here - spam detection, discerning the end-user's intent, better scalability for faster results, an infrastructure that permits continual updating -- on and on. Some kind of re-ranking seems like a tool that could take care of many factors at once, and it could account for several more fundamental changes we've noticed in the past 6-12 months - including things "breaking" that were not broken before.

Such an approach also would install many tweakable "dials", ranging from the re-ranking tests, to the weighting of the multipliers. But, (sigh) since this is all my guesswork, the real situation could be quite different. But this is the best I have for now.

