Forum Moderators: open
[spaces.msn.com...]
written by an Israeli mathematician named Hillel Tal-Ezer. It points out a fault in Google's PageRank algorithm that causes 'sink' pages that are not strongly connected to the main web graph to have an unrealistic importance. The author then goes on to explain a new algorithm with the same complexity of the original PageRank algorithm that solves this problem.
Yes the basic formula is math but if you're going to call E=Mc2 wrong i'd expect a more indepth example than one or two paragraphs.
I'm admittedly right-brained and dysfunctional in math, so I'm rowing with one oar in being critical about this
paper.. but come on, I think Google knows what they're doing.
You have to question whether any technology research that is based on 7 year old data is worthwhile
Yeah, it might be able to be questioned, but the 1998 PageRank paper is still the authority; everything published in this field is based on it.
More current research based on PageRank was published by folks from Stanford and Yahoo as recently as two months ago, see Link Spam Detection Based on Mass Estimation [www-db.stanford.edu] (pdf). I tend to doubt that these folks would spin their wheels by basing research on an outdated paper.
I've never been satisfied that PR calculation would vary according to keywords/content as some have suggested. Mathematically that doesn't make enough sense - PageRank is natively independent of the page's topic.
It can be perturbed in many ways.
You also have to take into account the fact that the graph is anything but complete let alone well connected.
Just look at the effects that can be caused by a duplicated site sitting off by itself maybe linked to only by a couple of links to what would not be considered by the site owner as being part of the site.
AKA the www/non-www problem or any other graph created by a server response of 200 to googlebot request for a page that isn't part of a sites natural graph.
Pagerank and SEM overall is not just math, it is of course linguistics, heuristics, etc. as well.
PageRank is "just math". There is no linguistics or heuristics at least by the definition in the original paper. Read section 2.4 Definition of PageRank in "The PageRank Citation Ranking: Bringing Order to the Web".
You have to question whether any technology research that is based on 7 year old data is worthwhile.
It's certainly worthwhile to try to determine and state why the PageRank mathematical model doesn't work. Finding and correcting perceived errors in ongoing research is a major reason for citation, writing papers, academic publishing and peer review. Those errors can be cited, avoided, and the model improved on by other researchers who are applying PageRank-like algorithms to their own research.
One should always been careful when quoting papers which haven't passed a review process.
PageRank is "just math". There is no linguistics or heuristics at least by the definition in the original paper.
These guys are essentially looking at a car tire, and saying 'this piece of rubber cannot possibly transport people effectively', out of the context that when the tire is attached to the car things work a little better.