Page is a not externally linkable
- Google
-- Google News Archive
---- Why AOL and MSN execs need to be mathematicians


Bottler - 9:54 am on Apr 1, 2004 (gmt 0)


also this recent paper: Combating Web Spam with TrustRank discusses the selection of "reputable seed pages".

Oh my God. I just read this paper. I can't believe how mindboggingly naive some of its underlying assumptions are. Check this out - "Since trust flows out of the good seed pages, one approach is to give preference to pages from which we can reach many other pages". No further comment necessary.

Furthermore check this out. "In order to get rid of the spam quickly, we removed from our list of 25,000 sites all that were not listed in any of the major web directories". More evidence of their heavily reliance on leeching intellectual property value from the major directories.

As for seeds, in this paper the seeds they refer to are only potential page candidates that are likely not to be spam (based on silly assumptions such as above) to kickstart an algorithm for spam detection based on propagation of "non-spaminess" measures. The paper does not discuss PageRank algorithm seeding.

Great papers though for finding out how they are thinking. Thanks.

[edited by: Bottler at 11:17 am (utc) on April 1, 2004]


Thread source:: http://www.webmasterworld.com/google_archive/23034.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com