Bottler - 6:43 am on Apr 1, 2004 (gmt 0)

Not entirely sure you've thought through how PageRank works all the way back. Critical to the way PageRank is calculated are respected sources. The PageRank vectors are initially (and in some cases periodically) fed by the value of these initial sources.

Here's an example. Let's say I wanted to rank the intellectual value of every poster on WebmasterWorld. One way I could do this is by simply drawing up a large network of directed connections between individuals where each connection represents a reply by one poster to another poster. I could then assign values to each connection based on the value of the poster who replied and say that a percentage of this value is transferred to the original poster who is being replied to. After applying some normalization at each step, I could build up a value rating for every poster on WebmasterWorld.

But there is a problem with this model. It is recursively defined and needs starting values. Do I simply give everybody the same initial starting value? If I did this, it might turn out that a lot of dumb guys like to hang out with each other here responding to each others posts humurously or out of boredom and members of this dumb guy group would rise to the top as most valuable posters. Alternatively I could ask Brett Tabke and GoogleGuy to identify (in secret of course ;))a small collection of posters they consider intelligent and knowledgeable on WebmasterWorld. We could assign these people an initial value rating to seed the algorithm described in the previous paragraph. From here we can recursively construct a value rating of all users.

Herein lies Google's problem. With the number of initial authorities like Yahoo, Looksmart and DMOZ diminishing for them, the likelihood of results being vastly artificially boosted by the popularity of the mere connectedness of the community in which they belong would be very high. Consider for example the large network of teenage gamers out there on the net who have websites and link to each other. Without the appropriate source values, our results could be dominated by the collective opinions of warez kiddies, free porn junkies and Britney Spears fans.

Intellectual property value sources are critical to Google's rankings and Google's serious options for seeding are running out.

 The Google filter is specificly targeted at commercial phrases and mom-and-pop commercial web sites. They could care less if a commercial business did not show up in the natural results

Well if they don't care, then they should care because their users definitely care. When I search for "cheap jewelry", who wants to see some site about a flea market in downtown Nowheresville that some University student bought some fake second-hand jewelry from for his girlfriend one drunken weekend? Or when I search for "fast cars", who wants to see a government report about accident rates on major highways? Commercial searches constitute a huge percentage of all searches.

