Does PageRank Affect Your Ranking in Google?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Does PageRank Affect Your Ranking in Google?

ciml

1:10 pm on Feb 17, 2005 (gmt 0)

Macro drew some responses to his post [webmasterworld.com] when he answered "How does page rank affect your listing in SERPs (Search Engine Result Pages)?" with "It doesn't". He qualifies that comment, but let's just follow that tangent.

In my view yes, PR has a direct influence on ranking for a given search phrase but the direct influence is very small indeed, it is not worth worrying about, it is not important. (Just to avoid confusion, I'm using "important" as definted in Cambridge/American Heritage/etc.)

There are secondary factors (e.g. people like to exchange links with high PR pages/sites) and there are related factors (e.g. sites that have high PageRank tend to have plenty of links from different sites with supporting anchor text).

Also, PageRank is useful for other things (e.g. crawl depth and frequency) but the direct influence on rankings of having more PageRank is, as Macro put it, "small enough a consideration to be ignored".

Papagaio

2:39 am on Feb 18, 2005 (gmt 0)

I have only one thing to say about whether PR affects ranking in Google: I am constantly finding PR 0 sites/pages listed above other sites/pages that have far higher PR.

So I am tempted to ignore PR, except for this: If PR suddenly starts to worsen on a page, something is amiss and had better be found out and fixed, lest it's an indication that G might be about to sandbox the site or apply some sort of penalty or what have you.

steveb

3:26 am on Feb 18, 2005 (gmt 0)

"I could not put a number to it, but as a sheer guess, I'd say a very savvy webmaster could get a PR5 page to rank above a seemingly similar PR6 page...perhaps even without much difficulty."

That's not the issue, and not just because the difference between PR5.96622 and PR6.0126 is next to nothing. The issue is two fold: PR6 normally beating PR2; and, when no one targets a search with anchor text or title, a higher pagerank page with the words scattered on the page or partly in the title will have a major advantage over lower PR pages built similarly.

The www example of course is simplistic, but anyone wanting to prove PR doesn't matter, let them target "welcome" with a PR3 page and see how far they get.

Oliver Henniges

7:40 am on Feb 18, 2005 (gmt 0)

Hope this thread keeps its niveau. A well positioned lean-back-and-breathe one while so many analyse DCs in panick. So here's a few cents from me.

> "we use this new system called GoogleRank..."
If ever it'll be called BrinRank (TM).

> Variables such as "Eigentrust" or perhaps some LSI driven variable are clearly possible and potentially viable. The semantic driven possibilities are endless...

Yes. This is the point. In order to evaluate a page's relevance on a given search term, google may take any mathematical abstraction over that huge amount of data it has collected in the past. So concerning CIMLs initial question, I think we all would agree that PageRank now is nothing but one on the hundreds if not thousand knobs google has implemented. Still an important one, of course, and the shorter and more competitive the search-term, the more important PR gets.

>Anyone with any ideas on how the extent of PR's influence (or lack of influence) on SERPs can be best tested?

If you ask me the times are over, where a single individual was able to do that by trial and error, and I even think with Allegra the times are over, where the very valuable exchange of ideas and experiences in all these threads will lead us to SIMPLE results by means of which one can bring back a sandboxed site or regain lost positions in the SERPs.

Most of us will know that from a mathematical point of view it is principally impossible to infer from the output of a von-neumann-machine to the algos working in the background, once the algo has got beyond a certain complexity. In the past five years SEO-experts had gotten around this by collective theory-building, -evaluation, and by intensive discussion, so that most of google's surprises finally lead to an acceptable consenus matching the algos, with which we continuously improved our rankings and income. But I doubt this will still work in the future.

Whenever I do a research on any given topic apart from my own business google supplies me with acceptable results. This is the basis of googles success. I hear hundreds of you cry "veto", but this is my experience, and not only mine. So the core-algos, which google has implemented, do work, and all googles programmers have to do is detect and sort out spammers who have found means to work around the algos. I guess these filters work more like an intrusion detection system than bother with e.g. link text and the like. The word "expert" (Dreyfuss on artificial intelligence in the eighties) floods thru my mind. It is similar to a way a good musician doesn't have to bother about what his fingers precisely do, or to the way you think about a new computer-programm while your vegetative nerve system drives your car. All that is left to google is fine tuning.

And this holds true vice versa. Such a finetuning on the webmaster's site, however, would require a much broader and concise basis than in the past. What we IMHO need are means to categorize and maybe test our observations with the support of machines.

I think Googles algorithms of relevancy-evaluation are the most complex finite automaton in the world concerning the straight and clear output it produces (of course an OS is more complex, but it doesn't condense its output to a list of ten links generally). Maybe beyond this complexity and because of the output's simplicity, it is possible to cope with the algo-results by statistic means again. Sort of mirror-algos, completely different, but matching/predicting the results to a statistically acceptable amount. Such algos, however won't be linear values any more like e.g. keyword density and I must admit to find them was perhaps no easier than to programm a search-engine itself.

< Ultimately Google takes these complex variables, simplifies them..
Well, yes, simplicity is one of the keys, right in contrast to my posting...

grelmar

8:22 am on Feb 18, 2005 (gmt 0)

Couple of thoughts:

First thought:

relating to the

Of course it's not just that, the peculiar notion that pagerank doesn't matter just dies here
[google.com...]
search:
For the first 100 results, the PR holds, after that, it starts to fail. Go deep into the results, and you start seeing PR 7 sites ranking above pr 9s. (Pr 7 was the lowest I saw in the top 500 - based on radnom sampling, not looking at all of them).
So, High PR matters, yes, but the TPR is obviously somewhat misleading. In the top 500 results, there were a number of PR 9 sites that I know of that didn't show up, and numerous sites as low as PR 7.
Inference: Either the TPR isn't the PR google is using (and that seems likely, to an extent) - or it is being seriously underweighted. Does the home page for The Gimp (PR 8) seriosuly deserve to be nearly 100 positions higher than a PR 9 cnet.com? A PR 7 MTV deserve to rank higher than a pr 10 MIT?
For that matter, does the Gimp deserve to place higher than MIT?
I wonder....
Second thought:
From Zdnet [zdnet.com.au]
The numbers alone are enough to make your eyes water.
# Over four billion Web pages, each an average of 10KB, all fully indexed.
# Up to 2,000 PCs in a cluster.
# Over 30 clusters.
# 104 interface languages including Klingon and Tagalog.
# One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
# Sustained transfer rates of 2Gbps in a cluster.

I've done a bit of thinking about this off and on since I read the article last december. Google has one of the most complex "managed" processes in the world. I'm starting to wonder just how much they manage it. How much of their results are a factor of their algos, and how much of it are the result of a frighteningly large amount of "freak" data that tends to creep into such massive computing projects.
The only other results data available to the public from similarly large computing projects are from long term (10 year plus) weather modeling. I've scanned the results from a few of them, and it's staggering to see the error rates. Run the exact same simulation 4 times, and you get 4 different results. With sometimes HUGE variances.
Why? from creeping data corruption. It's almost inevitable from such large computational projects. With equipment failure rates the likes that Google is seeing, it would be fascinating to be able to completely halt the process, and just sift through the entire thing looking for data corruption at any given point in time. Sure, they have algorythmic smoothing, to take into account for the errors, and redundancy as well. But can you really beat the problem with such a large system?
I've never seen anyone answer that with a definitive "Yes, we can beat the error creep, 100%"
So, I don't know if the question is really about whether they intend PR to have a fixed relevancy. I think the question might be "Can they enforce their intended PR relevancy on a system with a constant floating error rate."
Not to keep anyone up late tonight wondering just how random the results really are.

BeeDeeDubbleU

10:44 am on Feb 18, 2005 (gmt 0)

I've done a bit of thinking about this off and on since I read the article last december. Google has one of the most complex "managed" processes in the world. I'm starting to wonder just how much they manage it.

Correct ... I think the answer is that they cannot manage it. They may be able to manage 95% of it but they will never, never ever, get an algorithm to produce perfect results. Tooooo many variables.

claus

11:31 am on Feb 18, 2005 (gmt 0)

a higher pagerank page with the words scattered on the page or partly in the title will have a major advantage over lower PR pages built similarly.

The penny dropped :) For otherwise identically scored pages of course high PR will trump low PR.

MIT vs the GIMP

- just look at their domain names. This is an example of other factors kicking in.

Anyone with any ideas on how the extent of PR's influence (or lack of influence) on SERPs can be best tested?

Well, you could always model it mathematically by fitting a straight line with PR on one axis and rank/position on another, ie. regression. That would give statistical evidence.

Still, it would be biased:

With the Google API and a good number cruncher, you could get a 100,000 item data set analysed in a day, but although this is a very high number it's still just a very very small fraction of the total data set (the corpus/population = 8 billion). Moreover, it's not just the eight million pages, it's an even larger set as it's not the same set of pages that is returned for each query.

Considering the published figure that Google gets 200 million queries per day (or was it more than 250M), we're looking at a total number of possible ways to run 200M queries on 8B pages something like this:

8,000,000,000^{(200,000,000)} (method: ordered selection, no substitution)

- possible outcomes per day, assuming that each query is different. That's factorial notation, it's not 8B raised to the power of 200M - the calculation is done like this:

8,000,000,000 * 7,999,999,999 * 7,999,999,998 ... *(8,000,000,000 - 199,999,999)

If we assume that queries can be reused during the day (ordered selection with substitution), the calculation is

8,000,000,000^200,000,000 (8B raised to the power of 200M)

So, your sample - even if it has 100,000 data points - is very small.

The search terms you choose will have to be very representative of the total sample of possible searches in order to yield a true picture. So, you will have to define a very specific topic area first, and hence your results will only be valid for that niche.

Also, normal laws of probabiblty apply, and there's some method considerations and assumptions that must be made. So, there are other considerations and error sources, but this (sheer size) is the most important, imho.

---
i don't think/hope there are errors in the above calculations or flaws in the assumptions i've made, but of course there might be

----
Added: There's one flaw/error at least:

For each of the queries 1,000 pages are selected from the 8B corpus. That's 8,000,000,000^(1,000) possibilities per query. This happens 200M times per day. Hence we might get :

200,000,000^(PQ) where PQ = 8,000,000,000^(1,000)

I'm not sure that's the right way to calculate it, as it says "out of 200M queries, how many different ways can we select 1,000 pages out of 8B pages" (or at least that's what it's supposed to say). Anyway, the bottom line is that the number of possible outcomes is huge.

[edited by: claus at 11:55 am (utc) on Feb. 18, 2005]

Macro

11:32 am on Feb 18, 2005 (gmt 0)

steveb, I concur that that was a clever search but, it doesn't prove anything about tPR. Apart from caveman's reasoning that the "www" term is too unoptimised there is the possibility that the tPR of the first few pages is the effect of them being so high in the SERPs, not the cause.

Perhaps you could sticky me some of the search terms on which you see a strong correlation between tPR and position... because my searches seem to give me the same result that claus was getting. It would be interesting to explore whether some terms in fact do get results based largely on tPR.

the shorter and more competitive the search-term, the more important PR gets

Very interesting. I did a few checks. Particularly in light of steveb's argument that 6.01 doesn't differ much from 5.9. Hope it's OK with the mods to produce the few observations here.

loans: 7 8 6 7 7 6 7 6 6 6........7
<snip>
free: 8 9 9 10 10 9 8 9 10 8 8 .. 8 10 (#15)
and: 8 9 10 8 9 8 9 8 8 9..... 10 (#30)
computers: 10 8 7 8 9 8 7 0 6 6 7 ...5 ...5 ..5... 4... 8 (#105)
vacation: 6 8 8 7 7 5 6 6 5 7 ...4 ..4.. 4.. 0 ..7 (#104) 7 (#105)
sports: 8 8 8 7 7 8 7 7 8 8 6 6 ...3..0..5..8...1...8 (#310)

Assume for a second that tPR didn't matter and didn't exist. Value pages on other criteria altogether. Could they appear in the above order?

weather modeling

It's easier to check for errors in weather modelling. If you predicted rain in Spain and it was unusally dry that's a clear error. What is an "error" in SERPs? ....

algorithm to produce perfect results

"Perfect results" is subjective. They may not be striving for perfect results, just results that look relevant and exclude "spam". The excluding spam may be where more of their efforts are directed.

Hope this thread keeps its niveau.

Er, hmm, OK, here goes: Me too :)

[edited by: ciml at 1:29 pm (utc) on Feb. 18, 2005]

ciml

1:33 pm on Feb 18, 2005 (gmt 0)

jcoronella:
> PR is still calculated, and for all I know is still calculated the same way

It seems that way to me too, apart from the choices over which links count or not - and supplemental results.

Marval:
> I do agree that of those 100s of factors - some like authority linking are so closely related to Page Rank that they have basically become a part of it

Or work alongside it?

claus:
> Analogy: The less educated sales clerk sometimes do know better than the professor.

And these days many different less important pages can have more influence than a few more important pages, compared to in the past.

It's hard to find the 'right answer' to a search phrase when the context is unclear, and when webmasters are working hard to look like they have the right answer.

steveb, I agree that if you have many, low competition pages/phrases then PR can be the decider. This is almost the case of "When all other factors are equal, PageRank can be decider.", so I would still describe PR as very small direct ranking factor.

> Of course it's not just that, the peculiar notion that pagerank doesn't matter just dies here
> [google.com...]
>
> Gee, where are all the PR2 pages?

Those pages have many links of the form <A href="http[!]://www.example.com">www.example.com</A>. The high rankings for "download" are not there due to the PageRank in my opinion, so much as all the links that mention "download". I doubt we could untangle such results to say whether the anchor text or PR matters more, we each just guess based on our other experience and research.

mrhazelj

2:14 pm on Feb 18, 2005 (gmt 0)

I think it's like this. first, you must have proper on-page seo done first. next, get related, authoritive links while making sure you have the correct anchor text for those links you do have control over. Lastly, worry about the pr of those sites. Seems to be the most logical way to think about it... and the most simplified way to me at least. If two competeing sites were to have equal on page seo done, equal sets of related links, smae quality anchor text, then the site who links are on pages with a higher total pr would rank higher.

Kirby

2:48 pm on Feb 18, 2005 (gmt 0)

And these days many different less important pages can have more influence than a few more important pages, compared to in the past.

These are the results that I am getting with many of my own pages. I attributed it more to the authority status of the site as a whole than to the PR of the individual page.

Oliver Henniges

3:37 pm on Feb 18, 2005 (gmt 0)

> If you predicted rain in Spain and it was unusally dry that's a clear error. What is an "error" in SERPs? ....

If I got that right, grelmar was aiming at real hardware errors. He was asking for how much of the fluctuation we see at google's was simply due to errors of the physical system itself. Electromagnetic induction.

> so much that hard disk error rates of 10-15 begin to be a real issue

@ claus:

I do not think it is necessary to work with such huge figures if you simply concentrate on the input and output of that finite automaton, as I said (Lets assume it was stable and noone's throwing dices there). Google claims to have 8 billion pages indexed. Most people use google with the default option to show ten links. So the set of possible results is much smaller:

8.000.000.000 * 7.999.999.999.*...* 7.999.999.991

As for the input it would be sufficient for a first approach to concentrate on one languge first and to queries with less than four words. so even if you assume a million english dictionary entries, you will have an even far smaller set on the input side.

I do not see in how far the number of queries per day was relevant, and I also think we can put hardware erros aside at present, although both are interesting points of course.

I admit that even an illegal brute force attack of 100k queries still wouldn't suffice to provide a statistically significant result over this pool for ciml's initial question.

Nevertheless I find it quite interesting to narrow down the problem from this perspective. Given my figures the whole problem settles down to decrypting a 512-bit-hash-algo, which should't be too complicated all in all (*g*).

Our advantage is, that in the past we have isolated quite a number of other (linguistic) coefficients to explain the observed input-output-correlations. Which one(s) would make up the most efficent hook(s) to further simplification?

TheWhippinpost

5:08 pm on Feb 18, 2005 (gmt 0)

If one queries a single-word BRAND-NAME, or product name that's different to the manufacturers company name into G, it seems to do pretty well at returning the manufacturers site, in my experience.

That's an important indicator of authority attribution, and, I believe, semantic-relations...

I've detailed before how I found, using the tilde, that G correctly associated (in context) a German company name with a common English verb (TOS is so frustratin at times!). Tinnitus later (Sorry if that's not ya right moniker), noticed similar in his observations.

I believe, if you sniff around these relationships enough, that an almost, dare-I-say it, hierarchical word-tree can be seen.

For example; BRAND-NAME has related words which may include its verbs (typical uses) etc... and, importantly, MANUFACTURER. After sniffing around you may be drawn to the conclusion that G attributes MANUFACTURER as parent to BRAND-NAME and so on with with the others... (Sniffing around may involve "reverse-sniffin" from not so (obvious) strongly-related words).

In short ('cos time is), amongst the plethora of other inter-mixing algo's that play their part, there may be "semantic authority". How it scores that semantic authority (or builds its word-tree) may well be derived from PR and all that entails, because the theory itself predicates an authority (word) source.

As soon as you add another word to the query however, like for instance: [BRAND-NAME TUTORIALS], that opens a new additional "word-tree". Woven together, the possibilities are significantly increased and goes beyond just a 2 KW match, there's a potential myriad of related possibilities also. It stands to reason that the MANUFACTURER site may not necessarily compete against a strong info site in these situations therefore. BUT, that nevertheless doesn't mean that the "word-tree" surrounding BRAND-NAME is less important - They're still factored to find the best AUTHORATITIVE contextual match.

My explanation is not precise enough because of time and merely reflects my extrapolations based upon observations. But what it tries to say is that "semantic authority" (I'm tryin to avoid the LSI definition c'os I don't think it's strictly that) may be taking the "stress" off the original PR link attribution by openin other "word" possibilities also. So it could mean that your (tutorial) info site is judged important in another related field that matches the "word-tree(s)" and so qualifies to be included in the mix too... whatever the PR of that page is.

I suscribe totally to CIML's synopsis - even if the above is different methodology - PR still has an important role, but maybe for not the same front-end reasons.

Right, gotta go, hope that's clear enough... prolly not!

claus

5:26 pm on Feb 18, 2005 (gmt 0)

i probably shouldn't have posted numbers, but nevermind now...

>> I do not see in how far the number of queries per day was relevant,

Because what you calculate above is the possible ways to select ten pages out of 8B pages. This is done one time per query, but we need to do more than one query to get that 100,000 item data set. I used the day figure as that's the only one i know of.

100K data points comes from 1,000 Google API queries returning 100 results each. 1,000 queries per day is the limit set by Google. Now that i think about it, i don't think you can return 100 results in one query by means of the API. Also, the API don't return PR values.

>> so even if you assume a million english dictionary entries

You can't really use a dictionary list if you want a picture representing what real people are seeing in the SERPS. You have to use real queries. It probably wouldn't be too hard to collect a large number of real queries from interested SEOs in an anonymous database though.

caveman

5:44 pm on Feb 18, 2005 (gmt 0)

Based only on experience and a lot of eye squinting, I am pretty confident that well executed SEO can generally trump more than one level of PR, but usually not quite two levels. Considering that PR is probably logarithmic and not arithmatic, overcoming almost two levels of PR (i.e., from PR 4.555 to PR 6.555) with basic SEO techniques suggests a lot to me about the relative importance of PR in today's SERP's.

As our levels of knowledge and experience have grown, my group has come to rely less and less on PR when executing marekting plans, to the point that we rarely look at PR except when evaluating site structure, or when buying advertising on other sites. But even when buying advertising on other sites, we value on topic pages more than anything, and are not afraid to purchase advertising on PR0/1/2 pages if they are relevant to our pages.

PR is one of those things that you need to understand, and then put in its place. Most of our time is spent, as buckworks says, on relevance. Just a poor man's approach to SEO. ;-)

ogletree

6:04 pm on Feb 18, 2005 (gmt 0)

I remember when I first started I wanted to rank for a term 2 word term (kw1 kw2). There was a company that had a PR8 site that had the site kw1encyclo.com. KW2 can also just be a 3 letter word not related to kw1. All they had to do was use kw2 in the text somewhere not even near kw1. kw2 is such a common word that it was almost guaranteed that it was somewhere on pages linking to that site. They were number one for a while. I think it finaly fell off after florida. PR used to be king. High PR sites ranked for all kinds of weird things.

I think what is happening now is that they use the old PR system to make an index and then filter the heck out of it. I'm sure the algo has many laywers added to it b4 we see it. I think they still have updtaes they just do them whenever they feel like it. I also think they have several versions running around and alternate showing them. If you have a rolling update you never finish updating so they just have to decide ok this is ready call it update 1 ver 2125 they rotate between update 1 2 and 3.

ciml

6:19 pm on Feb 18, 2005 (gmt 0)

Chris_R once posted here about the relationship of home page PageRank to stock market value of US corporations. We all assume that Google don't boost PR on the basis of market capitalisation, and yet the trend is striking.

Claus, Oliver, I see where you're going with the regression analysis, but in my opinion the number and reliability of the collected data aren't such a problem as the validity of the tests. Important factors tend to accompany PageRank, and these skew the results systematically.

Personally, I have found the use of analytical methods on testbeds to be highly rewarding.

PR is one of those things that you need to understand, and then put in its place.

caveman, you have summed up dozens of threads with one clear, succinct sentence.

BigDave

6:37 pm on Feb 18, 2005 (gmt 0)

I think part of the reason that people discount PR is that they are not looking at it objectively.

PR is not as important as it once was. I think that we can all accept that. Just because it is no longer as important does not mean that it is no longer important.

There is also a problem of scope when looking at PR. While a PR2 can beat out a PR6, what happens when the PR2 drops to PR1 and the PR6 jumps to PR9? If this happens without changing any other factors, are you still so sure that they would rank the same way?

Doesn't it also depend on your goals? It may not make a lot of difference on the search that you optimized for, because all the other factors are so high, but it sure can make a difference on all those secondary and tertiary terms. On those terms, your pagerank is about all you have going for you.

There is nothing that affects your ranking across many terms, and across your site, like pagerank does.

As it is, I have pages that continue to climb for terms where the only thing going for that page, and the only thing changing, is the pagerank.

My conclusion is that it really is not worth pursuing PR for specific targeted terms. Your efforts are better directed elsewhere. For attempts to generally rank higher in your field across your site, it is vital to include pagerank towards the top of the list of factors you are concerned with.

caveman

7:39 pm on Feb 18, 2005 (gmt 0)

There is nothing that affects your ranking across many terms, and across your site, like pagerank does.

Hard to argue with that, given PR's universal importance.

... it sure can make a difference on all those secondary and tertiary terms. On those terms, your pagerank is about all you have going for you.

With respect, this is the general area in which I might differ. My personal take is that in fact PR's influence on secondary and tertiary searches is less significant now than was previously the case. Since Florida, and more so with Allegra, we are doing far better for these kinds of searches than ever, especially on our more content oriented sites - and it is not PR related.

steveb

8:34 pm on Feb 18, 2005 (gmt 0)

"For otherwise identically scored pages of course high PR will trump low PR."

Well, that is what people are arguing against. It's obvious, but people don't want to accept it, I think because there is a tendency in a lot of people to need focus/obsess on that ONE THING that is responsible for everything. It just isn't the case. PR matters a lot, but it doesn't matter the most, or solely, and it can't trump the 99+ other algo ingredients together.

"I did a few checks."

Which again should put to rest this fantastic notion the pagerank doesn't matter. Where are the PR2 pages for all those searches? It simply doesn't matter how a PR2 page titles a page, or if it gets 100,000 PR0 anchor text links. It can not compete against PR8 pages with decent titles and linking.

Fortunately though this is perhaps the one myth that anyone can easily debunk themself without doing much of anything. Create two identical pages (or near identical if you want) with gibberish titles and text. Link to one from a PR6 page that has no other links on it. Link to the other from a PR2 page. Wait for both to be crawled (since the soon-to-be-PR5 page will get crawled first, probably). See which one ranks first and which is second.

Which goes back to "when all other things are equal". That's just non-helpful in understanding this or anything else about SEO. All other things are never equal (not counting duplicate pages, or page titles which are often exactly the same). Virtually *everything* matters. There isn't that ONE THING that decides everything. Even if link text is the most important thing, when two pages have similar link text amounts and wording, some OTHER things then will be the deciding factors in ranking.

And then, natural pagerank is a clear thing that isn't obtained in a vacuum. It's just the mathematical component of "get links from well-regarded sites" which has its own algo value. In a business with a lot of collateral damage, its nice to see something with collateral benefit. On the other hand, buying pagerank to a crap domain is again like putting a dress on a pig. It's an unnatural signal.

Marval

9:14 pm on Feb 18, 2005 (gmt 0)

I guess I'm still seeing a difference or split between what we are discussing - PR (toolbar) and PageRank - to me there is a huge difference. I totally agree that the Page Rank is extremely important with slight modification by other factors (or are those other factors part of Page Rank?).
But the PR2 and PR6 as measured by the toolbar little green bar is (in my opinion and experiences) not a reflection of Page Rank, it is a pretty little gizmo that gets way too much attention when there are plenty of other ways to figure a web pages real Page Rank.
Thats why I tend to smile every time I get a link request that starts with "I have a PR6or7 for hard link trades..." you guys know the rest.
I for one very rarely (if ever anymore) look at what that little green bar says when I consider traffic/recip/just plain good for the surfer linking.
I really wish we could get away from examining Page Rank by looking at its PR score in the toolbar - way too many people that are newer getting into this consider the two equivalent - and I know that if you look at the other factors that go to Page Rank - it's usually not even close.

blaze

10:22 pm on Feb 18, 2005 (gmt 0)

As I'm sure it has been said in this thread and many times before, an *ontopic* high PR page is going to have a much better boost to your webpage then an offtopic low PR page.

PR is important. To discount it like this is wrong.

Yes, if the web page pointing to you has nothing to do with your particular topic, then the PR boost will be minimal.

grelmar

11:04 pm on Feb 18, 2005 (gmt 0)

I really wish we could get away from examining Page Rank by looking at its PR score in the toolbar - way too many people that are newer getting into this consider the two equivalent - and I know that if you look at the other factors that go to Page Rank - it's usually not even close.

Ok, so is there a quick and dirty way of checking a sites real PR?

I know of a couple long and convoluted ways, but the problem with them, is that they're long and convoluted. And the margin of error with the TB-PR is +/- 1, mebbe 2 in extreme cases. Often enough, it comes out the same.

So, from my (albeit limited) experience, TB-PR is fine for a quick and dirty check. If something is showing as PR 0-1, then at best it's actually a PR 1-3. And, because of the logarythmic nature of PR, the error margin diminishes with higher PR. Above PR6, it's statistically insignificant (I know, lies, damn lies, and statistics...)

Part of this stems from a very simple observation: It's very hard to fool people as to what's a high PR site if they're sitting there staring at it. IE: If you pull up Mozilla.org, there's that nice big green bar of PR9. Anyone who's being observant about what's going on with the web can look at that and say "Dang straight that's a PR9 site, don't need no G-man to tell me so."

The same would apply for the vast majority of pr8+ sites. It doesn't take an SEO Wizard to figure out that that rank is most likely well deserved.

Pr7 - mostly in the same category. Looking at pr7 sites as I surf around, I can pretty much assure myself (after viewing the site), that yes, that pr is as it should be.

Below pr7 - A different story. Sometimes I look at pr6, pr5 sites and wonder how they ever cracked past 0. Some lower pr sites, that have been around a few years, I wonder why maybe they aren't at least a notch or two higher.

But if you examine all those sites deeply, you can usually infer a reason (and inference is all we've got, because other than Brin and Larry, I'm not sure if anyone really knows.)

So.... TB-PR is good for a rough and ready estimation. IMHO

The question, then, is how does PR affect placement in the serps? How BIG is the relationship, how STRONG is the influence? Out of 100 factors, is it top 10, or bottom 10, or somewhere around 50?

And are those 100 factors on some kind of a bell curve, or diminishing return curve, as far as weighting?

My "guess" - Of the 100 or so factors, lets smooth it out and call it:

100 Factors of Google

And let's put those 100 factors on a traditional Bell Curve, in terms of SERP weight importance, wherein:

Class "A" 33% of the factors lie in the median range. (remember this from high school? ahhh... meomories)

Then each half of the curve continues to break accordingly:

16.6... % in "Class B"
8.3...% in "Class C"
4.16...% in "Class D"
2.083...% in "Class E"
1.0416...% in "Class F"

No need to go beyond class F, because we've only got 100 factors.

"Class A" straddles the middle of our curve, classes B through F fall on either side, so we end up with "Class B "High" (above the median" and class B "low" (below the median")

At the High end you have the most significant factors, the ones that carry the heaviest weight. Mostly, these "top half" factors are all we can look at, because below the median, their weighting becomes so low we can't really effectively measure them. Each drop in class cuts the "weight" of the factor in half.

Current Google Algo Weighting, MY Guess:

Class F "High" - (One Factor) On Page Keyword Desnity

Class E "High" - (2 Factors) Inbound Links (total), and Inbound Links (structure)

Class D "High" - (4 Factors) Keyword Structure (is it bold, italicised, headered?), Internal Links (total), Internal Links (Stucture), Site Longevity (how long has the site been around in it's current incarnation? We all know G likes the moldy oldies).

Class C "High" - (8 Factors) Site Size (the bigger the better), Update Frequency (the more the merrier), Page Weight (keep it under 30K), OutBound Link Relevance, Proper use of "Alt" tags, page Title, page URL, and Page Rank

Class B "High" (17 factors) - No idea.

As you get from class B "High" and into Class A (median), and then down into the B "Low" and under, you start getting into niggling details that are hard to pick out and control for. Also, classes F throguh C "High" acount for such a heavy proportion of the overall SERP ranking, that if you get them right, they'll outweigh the other factors.

UNLESS you have class B "High" and other line up with the sun, the moon, and Jupiter, in a way that can clobber the "High End" factors. Which seems unlikely. If you get Class F through C "High" right, chances are at leas some of the other factors are going to weigh in your favor.

While PageRank comes in as a Class C variable, with the class having 1/8th the weight of Class F, and each factor thereby having 1/64th the weight , PageRank doesn't seem all that improtant, but that too can be decieving, because PageRank itself is factored using the other variables, to an extent.

If you have a high PageRank, chances are you got some of the other variables right along the way, too.

Just a bit of idle speculation to while away the afternoon.

_{p.s. Yes, I was referring to hardware induce errors earlier, not algo induced errors - and we shouldn't undersestimate the influence of hardware induced errrors - Let's call that, in itself, a "Class C High" factor ;)}

grelmar

11:41 pm on Feb 18, 2005 (gmt 0)

And, BTW, that is by no means meant to be an accurate reverse engineeirng of the Algo.

Mostly, it serves as a good template to see what factors my time is best devoted to.

caveman

12:52 am on Feb 19, 2005 (gmt 0)

Didn't Brett (or somebody), a long time ago, take a crack at putting together a list of factors that affect rankings? Maybe even ordered by importance? Couldn't find it, if it even exists.

buckworks

1:36 am on Feb 19, 2005 (gmt 0)

If you have a high PageRank, chances are you got some of the other variables right along the way, too.

Yes, especially if your PR is relatively natural, instead of acquired mostly via rent-a-links.

BigDave

2:16 am on Feb 19, 2005 (gmt 0)

With respect, this is the general area in which I might differ. My personal take is that in fact PR's influence on secondary and tertiary searches is less significant now than was previously the case.

I don't see how you differ. You just make one of the errors that I point out.

It is not a question of how PR's influence compares to how it was previously. How it used to be matters not at all in deciding how it is now. It is not an on/off switch.

lego_maniac

2:50 am on Feb 19, 2005 (gmt 0)

A heuristic system could obviate the need for relying on any factor beyond observing user behaviour. In theory.

Doesn't this become a self amplifying feedback loop?

Google shows X as the top result, user clicks X. X gets "rated" higher.

Google lists Y as result #5000, user never clicks Y. Y never moves up.

In theory, other factors should always be taken into account.

Chico_Loco

3:35 am on Feb 19, 2005 (gmt 0)

In my view yes, PR has a direct influence on ranking for a given search phrase but the direct influence is very small indeed, it is not worth worrying about, it is not important.

If that's true, then it's safe to say that webmasters perverted it enough such that it was no longer a viable method by which to score pages.

This illustrates a point, there's nothing webmasters can't do when they put their mind to it.

Now, who's with me in trying to alleviate Google's power through promotion of other engines?

Search.com is showing me great results - yeah it's Google, but it's others too so it ain't exactly the same!

Macro

9:04 am on Feb 19, 2005 (gmt 0)

A heuristic system could obviate the need for relying on any factor beyond observing user behaviour. In theory.

Doesn't this become a self amplifying feedback loop?

Not at all. User behaviour includes making a choice from the first few results and hitting the back button (among other things ... and that's not even calling the toolbar info into play). If 99% of visitors Google sends me hit back within 2 seconds my page can't be very good.

Wizard

11:20 am on Feb 19, 2005 (gmt 0)

I guess I'm still seeing a difference or split between what we are discussing - PR (toolbar) and PageRank - to me there is a huge difference. I totally agree that the Page Rank is extremely important with slight modification by other factors (or are those other factors part of Page Rank?).
But the PR2 and PR6 as measured by the toolbar little green bar is (in my opinion and experiences) not a reflection of Page Rank, it is a pretty little gizmo that gets way too much attention when there are plenty of other ways to figure a web pages real Page Rank.

That's what I was trying to say before. However, what 'other ways to figure a web pages real Page Rank' can you suggest?

This 77 message thread spans 3 pages: 77