Faults With Googles Page Rank

Forum Moderators: open

Message Too Old, No Replies

Faults With Googles Page Rank

Brett_Tabke

5:18 pm on Dec 27, 2005 (gmt 0)

[www2.mta.ac.il...]

[spaces.msn.com...]

written by an Israeli mathematician named Hillel Tal-Ezer. It points out a fault in Google's PageRank algorithm that causes 'sink' pages that are not strongly connected to the main web graph to have an unrealistic importance. The author then goes on to explain a new algorithm with the same complexity of the original PageRank algorithm that solves this problem.

Clark

6:01 pm on Dec 27, 2005 (gmt 0)

Cool. What's the name of his new search engine ;)
Does he need investors? (Haven't checked the link because I don't like math.)

flyerguy

7:02 pm on Dec 27, 2005 (gmt 0)

The example in the PDF citing 'diabetics' is laughably simple. Pagerank and SEM overall is not just math, it is of course linguistics, heuristics, etc. as well.

Yes the basic formula is math but if you're going to call E=Mc2 wrong i'd expect a more indepth example than one or two paragraphs.

I'm admittedly right-brained and dysfunctional in math, so I'm rowing with one oar in being critical about this
paper.. but come on, I think Google knows what they're doing.

Edge

7:14 pm on Dec 27, 2005 (gmt 0)

As a engineer whom is left brained, I can't stop myself from smirking when one person claims to be smarter than many.

inbound

7:20 pm on Dec 27, 2005 (gmt 0)

What a waste of time, does the author truly believe that Google have not moved beyond the original model for PageRank?

You have to question whether any technology research that is based on 7 year old data is worthwhile.

jimbeetle

7:50 pm on Dec 27, 2005 (gmt 0)

You have to question whether any technology research that is based on 7 year old data is worthwhile

Yeah, it might be able to be questioned, but the 1998 PageRank paper is still the authority; everything published in this field is based on it.

More current research based on PageRank was published by folks from Stanford and Yahoo as recently as two months ago, see Link Spam Detection Based on Mass Estimation [www-db.stanford.edu] (pdf). I tend to doubt that these folks would spin their wheels by basing research on an outdated paper.

tedster

12:52 am on Dec 28, 2005 (gmt 0)

One PR phenomenon that several have observed is that a PR5 seems easier to achieve in some topics than others. This could be because pages in those areas are not as well-connected -- and this "sink pages" phenomenon comes into play.

I've never been satisfied that PR calculation would vary according to keywords/content as some have suggested. Mathematically that doesn't make enough sense - PageRank is natively independent of the page's topic.

theBear

1:53 am on Dec 28, 2005 (gmt 0)

PR is directly tied to the web graph.

It can be perturbed in many ways.

You also have to take into account the fact that the graph is anything but complete let alone well connected.

Just look at the effects that can be caused by a duplicated site sitting off by itself maybe linked to only by a couple of links to what would not be considered by the site owner as being part of the site.

AKA the www/non-www problem or any other graph created by a server response of 200 to googlebot request for a page that isn't part of a sites natural graph.

superpower

2:36 am on Dec 28, 2005 (gmt 0)

Pagerank and SEM overall is not just math, it is of course linguistics, heuristics, etc. as well.

PageRank is "just math". There is no linguistics or heuristics at least by the definition in the original paper. Read section 2.4 Definition of PageRank in "The PageRank Citation Ranking: Bringing Order to the Web".

You have to question whether any technology research that is based on 7 year old data is worthwhile.

It's certainly worthwhile to try to determine and state why the PageRank mathematical model doesn't work. Finding and correcting perceived errors in ongoing research is a major reason for citation, writing papers, academic publishing and peer review. Those errors can be cited, avoided, and the model improved on by other researchers who are applying PageRank-like algorithms to their own research.

FromRocky

4:30 am on Dec 28, 2005 (gmt 0)

I believe there are some weaknesses or faults in Google's PageRank algorithm. But I'm still unconvinced that one of these faults creates a sink page. If the sink page is so important and easy to create, why didn�t the author create one to prove the point?

doc_z

12:08 pm on Dec 28, 2005 (gmt 0)

It's a widely propagated mistake (not only made in this paper but also many other, even some of the original) that the PageRank algorithm is an eigenvalue problem. Just for d=1 (the original model) it's an eigenvalue problem. For 0<d<1 it's solving an linear system of equations. There problem described doesn't exists in this case.

One should always been careful when quoting papers which haven't passed a review process.

tedster

7:29 pm on Dec 30, 2005 (gmt 0)

One should always been careful when quoting papers which haven't passed a review process.

Or even a spelling and grammar check!

flyerguy

7:56 pm on Dec 30, 2005 (gmt 0)

PageRank is "just math". There is no linguistics or heuristics at least by the definition in the original paper.

Yes but Pagerank alone is meaningless with linguistics and all the other elements that make up Googles system.

These guys are essentially looking at a car tire, and saying 'this piece of rubber cannot possibly transport people effectively', out of the context that when the tire is attached to the car things work a little better.