homepage Welcome to WebmasterWorld Guest from 54.167.173.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
Forum Library, Charter, Moderators: phranque

SEM Research Topics Forum

    
Faults With Googles Page Rank
Brett_Tabke




msg:819924
 5:18 pm on Dec 27, 2005 (gmt 0)

[www2.mta.ac.il...]

[spaces.msn.com...]
written by an Israeli mathematician named Hillel Tal-Ezer. It points out a fault in Google's PageRank algorithm that causes 'sink' pages that are not strongly connected to the main web graph to have an unrealistic importance. The author then goes on to explain a new algorithm with the same complexity of the original PageRank algorithm that solves this problem.

 

Clark




msg:819925
 6:01 pm on Dec 27, 2005 (gmt 0)

Cool. What's the name of his new search engine ;)
Does he need investors? (Haven't checked the link because I don't like math.)

flyerguy




msg:819926
 7:02 pm on Dec 27, 2005 (gmt 0)

The example in the PDF citing 'diabetics' is laughably simple. Pagerank and SEM overall is not just math, it is of course linguistics, heuristics, etc. as well.

Yes the basic formula is math but if you're going to call E=Mc2 wrong i'd expect a more indepth example than one or two paragraphs.

I'm admittedly right-brained and dysfunctional in math, so I'm rowing with one oar in being critical about this
paper.. but come on, I think Google knows what they're doing.

Edge




msg:819927
 7:14 pm on Dec 27, 2005 (gmt 0)

As a engineer whom is left brained, I can't stop myself from smirking when one person claims to be smarter than many.

inbound




msg:819928
 7:20 pm on Dec 27, 2005 (gmt 0)

What a waste of time, does the author truly believe that Google have not moved beyond the original model for PageRank?

You have to question whether any technology research that is based on 7 year old data is worthwhile.

jimbeetle




msg:819929
 7:50 pm on Dec 27, 2005 (gmt 0)

You have to question whether any technology research that is based on 7 year old data is worthwhile

Yeah, it might be able to be questioned, but the 1998 PageRank paper is still the authority; everything published in this field is based on it.

More current research based on PageRank was published by folks from Stanford and Yahoo as recently as two months ago, see Link Spam Detection Based on Mass Estimation [www-db.stanford.edu] (pdf). I tend to doubt that these folks would spin their wheels by basing research on an outdated paper.

tedster




msg:819930
 12:52 am on Dec 28, 2005 (gmt 0)

One PR phenomenon that several have observed is that a PR5 seems easier to achieve in some topics than others. This could be because pages in those areas are not as well-connected -- and this "sink pages" phenomenon comes into play.

I've never been satisfied that PR calculation would vary according to keywords/content as some have suggested. Mathematically that doesn't make enough sense - PageRank is natively independent of the page's topic.

theBear




msg:819931
 1:53 am on Dec 28, 2005 (gmt 0)

PR is directly tied to the web graph.

It can be perturbed in many ways.

You also have to take into account the fact that the graph is anything but complete let alone well connected.

Just look at the effects that can be caused by a duplicated site sitting off by itself maybe linked to only by a couple of links to what would not be considered by the site owner as being part of the site.

AKA the www/non-www problem or any other graph created by a server response of 200 to googlebot request for a page that isn't part of a sites natural graph.

superpower




msg:819932
 2:36 am on Dec 28, 2005 (gmt 0)

Pagerank and SEM overall is not just math, it is of course linguistics, heuristics, etc. as well.

PageRank is "just math". There is no linguistics or heuristics at least by the definition in the original paper. Read section 2.4 Definition of PageRank in "The PageRank Citation Ranking: Bringing Order to the Web".

You have to question whether any technology research that is based on 7 year old data is worthwhile.

It's certainly worthwhile to try to determine and state why the PageRank mathematical model doesn't work. Finding and correcting perceived errors in ongoing research is a major reason for citation, writing papers, academic publishing and peer review. Those errors can be cited, avoided, and the model improved on by other researchers who are applying PageRank-like algorithms to their own research.

FromRocky




msg:819933
 4:30 am on Dec 28, 2005 (gmt 0)

I believe there are some weaknesses or faults in Google's PageRank algorithm. But I'm still unconvinced that one of these faults creates a sink page. If the sink page is so important and easy to create, why didnít the author create one to prove the point?

doc_z




msg:819934
 12:08 pm on Dec 28, 2005 (gmt 0)

It's a widely propagated mistake (not only made in this paper but also many other, even some of the original) that the PageRank algorithm is an eigenvalue problem. Just for d=1 (the original model) it's an eigenvalue problem. For 0<d<1 it's solving an linear system of equations. There problem described doesn't exists in this case.

One should always been careful when quoting papers which haven't passed a review process.

tedster




msg:819935
 7:29 pm on Dec 30, 2005 (gmt 0)

One should always been careful when quoting papers which haven't passed a review process.

Or even a spelling and grammar check!

flyerguy




msg:819936
 7:56 pm on Dec 30, 2005 (gmt 0)

PageRank is "just math". There is no linguistics or heuristics at least by the definition in the original paper.

Yes but Pagerank alone is meaningless with linguistics and all the other elements that make up Googles system.

These guys are essentially looking at a car tire, and saying 'this piece of rubber cannot possibly transport people effectively', out of the context that when the tire is attached to the car things work a little better.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / SEM Research Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved