Great Post MartiniBuster, and...
|It's 2010 but webmasters continue working like it's 2003 |
You can say that again... and again... and again...
Thanks for the link tedster, amazing interview!
|Ask why low PR pages outrank higher PR pages about the same topic. |
I think one of reasons is because if you ONLY take focused on-topic PR juice, the low ranking PR page may actually have a stronger "on-topic" PR than the high ranking PR page (where on high-ranking PR page the majority of PR may come from wider or perhaps more general on-topic subject).
So PR may tell you how popular the page is, but it does not tell you what for.
I would also argue that the more niche the query is, the more chance of lower PR pages to outrank high PR pages, whereas for more generic queries is more chance for high ranking PR pages to outrank lower ranking PR pages.
MadScientist made this point early on:
|the only reason it's over-hyped is because Google insists on displaying the 'Green Pixel Dust' in the stinking toolbar. If they would stop publishing the inaccurate, out-of-date, green idiot lights people would stop talking about it |
That strikes me as an asute observation and it begs the question:
Just WHY DOES Google insist on displaying this (so-called) information? Do they believe that the majority of the surfing public is actually taking note of which sites have the most green? I can see no valid reason in 2010 for including that on the toolbar, so am wondering if MC or anyone else has offered an "official" explanation? NOT a justification for Google's use of PR, but rather, why they think it needs to still be on the toolbar for the general public?
I'd guess the reasoning is that the green pixels are not there by default - so if you actively turn it on, you have an interest.
|Just WHY DOES Google insist on displaying this (so-called) information? |
1) To confuse people who are trying to "over-optimise"
2) To distort, and therefore reduce, the market for paid links by pushing the focus onto an indicator that does not work.
I only look at Alexa these days. This is not perfect but it more or less shows me whether I can get traffic from a site or not. Google taught me to go after links delivering traffic - these help the most both: traffic and rankings.
Truth is: high PR sites usually have high traffic but I know a handful of PR5 and 6 sites being penalized.
|Anchor text is not a factor in PageRank, so you can throw out that variable. |
I may have been misunderstood. I was stating that anchor text IS a "variable" i.e. that it would be impossible to test Pagerank as a ranking factor independently of the anchor text and backlinks you get to raise that Pagerank.
I see what you mean - it would be more of a challenge. You might use linked images with empty alt attributes.
|You might use linked images with empty alt attributes. |
Ah-ha! Okay, that takes care of the anchor text ... any ideas on how to increase Pagerank without increasing the number and/or quality of backlinks/internal linking?
If we could test this, then we'd get an answer!
There is no way - PageRank calculations are defined by links.
So the original question of this thread can only receive speculative answers at best?
Oh, well ... they can't all be golden! ;)
|So the original question of this thread can only receive speculative answers at best? |
Discussions about PageRank can only be speculative at best? Whoa! Who would have thought?
Thank you, thank you, thank you!
Elsewhere there's discussion to a recent interview with Cutts. Here's a quote which suggests PR does have a use: How often and how deep G will crawl your site:
|The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank. So if you have a lot of incoming links on your root page, we'll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we'll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline. |
Viewed from that point of view perhaps PR has some value after all. :)
|So the original question of this thread can only receive speculative answers at best? |
If you mean the question "Does PageRank Still Matter?" the answer is not at all speculative. The answer is YES, it still matters. It matters for crawling and for indexing and for ranking. PageRank is still a significant part of the overall picture.
|Viewed from that point of view perhaps PR has some value after all. |
Or, it reinforces the importance of deep links
|So if you have a lot of incoming links on your root page, we'll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we'll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline. |
I have long considered deep linking of prime importance. It is extremly difficult to sustain a site from the home page alone - a mistake I have seen a lot of people make.
Much has been said of Wiki's "internal linking". I would speculate that the vast number of deep links from external sources is as, if not more, important.
As long as I am speculating, what if a different dampening factor were used on internal pagerank vs external? Seems to me it would be a very efficient way to moderate attempts to manipulate via internal linking - make that PR decay faster each step of the chain.
|what if a different dampening factor were used on internal pagerank vs external? |
I apologize for quibbling, but it's worth noting something about the phrase, Internal PageRank. The phrase "internal pagerank" is generally used to denote the official pagerank score as seen at the GooglePlex. So for the sake of keeping our terminology straight, maybe we should not be calling the flow of PR within a website internal pagerank. Maybe something like internal pagerank flow is more accurate?
However, as I understand it, there is a dampening factor that affects how far PageRank will flow within a site. It gets back to the random surfer described in the original paper. At a certain point the random surfer will click out of a site or find what they want and leave. As I understand it, PR flow emulates that process, hence a decay.
And the main reason you need a dampening factor is to keep the PageRank values from going off to the sky as the calculation is iterated, over and over. Without a dampening factor, the numbers would never settle down to a stable value. With the dampening factor, after a number of iterations around the entire webgraph, PageRank numbers are only changing out at the 8th, 9th or 10th decimal place. That's the "dampening" that it creates.
Martinibuster - agreed on the need for clarity of language.
RE different dampening factors... I understand the existence of 'a' dampening factor. Simply suggesting that a higher dampening factor of on-site linking would also lead to a more rapid decay of PR.
Was kind of fun when we could all watch the dance. I kind of miss those days, and the race to announce it here on WebmasterWorld.
You may read recent Interview of Matt Cutts about the Pagerank , it also matter about index your websites, more useful site , Google will be do more index and frequent.
@tedster, I am not sure I see why a dampening factor is necessary for stable values. I though what they do is effectively solve a giant simultaneous equation, as in the linear algebra section here: [math.cornell.edu ]
The real maths is likely to be more complex as well as larger scale, but the principle should be the same. An iterative approach should (if they use it, which I doubt as it looks more computationally intensive to me) should give the same answers.
Consider what happens if every page splits its FULL PageRank vote to the pages that it links to, and then those pages do the same and so on.
As you iterate the calculation over and over around the web graph, the PageRank for any given page would never approach a limit of any kind. That is, it would never settle down, it would just keep growing.
The damping factor keeps that from happening - in the original formula I believe they used 0.85 for d, which is the damping factor.
|The real maths is likely to be more complex as well as larger scale |
Yes, I'm sure that's true. There is a difference between the definition of PageRank and the methods for doing the calculation. Google has evolved their calculation method over time, and they are not about to share that exact procedure. It is "secret sauce."
TThey did use a damping factor of 0.85. Brin and Page's own presentation in 1998 says so [ilpubs.stanford.edu:8090 ]
My point is that a damping factor is not essential, so you cannot be sure that it is still used. The paper also mentions varying the damping factor or only applying it to certain pages.
|Consider what happens if every page splits its FULL PageRank vote to the pages that it links to |
That is exactly what happens in the example I liked to, and it does not go off the graph.
Anyway it does not really matter, and I am really just quibbling because I find the maths interesting. My defence is that quibbling is probably as useful as anything else that comes out page rank threads!
|There is no way - PageRank calculations are defined by links. |
I have been wondering if this is correct for the TBPR. I still see sites with a strange internal pagerank flow. Let's say you have a main-page with TBPR4 and a straight navigation like:
The "random surfer" would go to internal page1, perhaps more often than to page5. But I still see internal pagerank flows like:
- page1: TBPR1
- page2: TBPR0
- page3: TBPR2
- page4: TBPR0
- page5: TBPR3
... which are not explainable only by links and inheritance of PR. Perhaps something to do with the content?
I tried everything to get the TBPR back to those PR0-pages: Changing content (all content has been unique all the time), removing all external links, etc. but I couldn't find a reason for the different internal TBPRs or find a test to get a smooth internal PR flow. Except for being just FUD of course...
And yes: There are still sites out there having a PR0 on the mainpage and ranking like hell.
My questions would be: Are those PR0-pages passing link-juice or not?
|Perhaps something to do with the content? |
Nope... Has nothing to do with the content.
|Except for being just FUD of course... |
This quote is much closer... IMO It's much more likely it has to do with the published data you see in the tool bar and 0 or 'grey bar' is the default when data is not received.
[edited by: TheMadScientist at 3:28 pm (utc) on Mar 17, 2010]
|...when data is not received. |
Thanks for your reply. What could be a reason for that? Are we talking about network-problems or "Google does not know" or "Google does not want to tell"?
I made the guess with the content because I often see link-pages (like they where en vogue 3 years ago) getting a TBPR0/TBPR-grey, especially if they are out-of-topic.
|What could be a reason for that? |
Data drop in the data pushing process.
|"Google does not want to tell"? |
Some types of pages might be dropped out of the published data to discourage link buying / selling purposely. Could be a connection thing where the number is cached by the requesting resource for N days (months) and was not present on the initial request and has not been re-requested yet.
Could be a the page(s) got detached during the calculation estimating process and did not get a published score this round.
I'm curious about the math too, so I read through the example at Cornell provided by graeme_p. It turns out you are both partly right. Without the dampening factor, the calculation converges for a connected web graph with no dangling nodes. However, the dampening factor is necessary to deal with disconnections and dangling nodes, so they could not dispense with it entirely.
This makes sense intuitively if you think of the random surfer model. The dampening factor represents the probability that the surfer will "jump" to a random page instead of following a link. If the surfer never jumps, the probability of the surfer landing on a page is still well-defined for a connected graph with no dangling nodes -- as the number of clicks goes to infinity, the probability of finding the surfer at any given page _at a particular time_ converges to its real value, which can be found by linear algebra. Of course the surfer will visit every page an infinite number of times, which is why one might think the series would diverge, but we are not calculating the number of visits to a page, but rather the probability of finding the surfer on that page, which converges.
However, if there is a dangling node (no outgoing links) the surfer will get stuck there, thus breaking the calculation. Therefore the dampening factor is required so that the surfer can leave such a page by jumping to some random other page.
None of this prevents the dampening factor from being different for different kinds of links -- that makes the calculation more complicated but it is still doable.
| This 58 message thread spans 2 pages: < < 58 ( 1  ) |