| This 85 message thread spans 3 pages: < < 85 ( 1 2  ) || |
|Google’s PR algorithm and/or iteration scheme changed|
… and it seems that they have a problem
I’m studying Google’s PR algorithm in detail for a long time, i.e. I’m building test pages to analyse changes in the PR algorithm as well as the damping factor, the logarithmic base and so on. So far I always got consistent, reasonable results. However, the current PR update is quite different. The results (several tests on different domains) show inconsistencies.
Possible explanations are:
- This isn’t (yet) a complete update; partly old data are used
- Google changed their PR calculation scheme. It seems that they used the Jacobian iteration scheme (mentioned in the original papers) in the past. They might change this scheme. However, they either didn’t perform enough iterations or there is a bug.
Modifications of the PR formular (as done by Google in the past) cannot explain the results. Also, even an incomplete update wouldn’t explain all data. Thus a change in the iteration scheme seems to be the most likely explanation.
This behaviour might explain other phenomenons such as the 'Google lag'. Of course, this is pure speculation, but the inconsistencies of the PR calculation (sprictly speaken, the inconsistencies of the PR values shown in the toolbar) are fact.
|>> Even a random factor (multiplicative or additive) for PR value shown in the toolbar wouldn’t be consistent with all my measurements. |
Doc, please explain.
Hard to explain without going into detail, but I can try to give you an example:
In case of an exponential decay a random value (individually applied to each point) wouldn't change the shape of the (regression) curve just the individual PR values. This means that fitting the values should almost lead to the real values of the exponential decay, i.e. the local noise is averaged out. Thus two identical linking structure with same incoming PR can have different PR values (shown in the toolbar) for the 'same page' (due to a random effect) but global parameters (where the randomness is averaged) should be the same.
doc, I'm not a mathematician and I may be completely wrong, but.... does your explanation only work for randomness applied in a methodical fashion (an oxymoron if there ever was one!). If it is the case that every page's PR has the randomness applied to it - but to varying degrees (of a random nature) then do you have a regression curve at all... or just noise? Would the extent of the randomness affect the size of the sample you'd need to use, and is your sample big enough...?
But, that's going to confuse me so....here's my explanation:
PR is broke ;)
My site was formerly a PR6 site, and a few months ago dropped to PR5. All pages linked from this page are credited properly with PR4. I have since added a sitemap (which has been spidered many hundreds of times since it was added) but this page is still showing a PR0, even after the latest update, and it is displayed in Google's cache. I have also not seen any other pagerank changes on my site since everyone started to report PR updates, yet Google hits my site for about 3k pages per night. I have managed to get my site ranked higher in search results, however, even with no changes in PR or backlinks.
Just to add to the melting pot.
I'm getting a PR3 on a "Google cached snapshot" of page that suddenly appeared from nowhere in a 10m + serp.
Please correct me if anyone has seen pr values displayed on cached snapshots before.
The PR algo is a patented work. It was granted a patent by the U.S. Patent Office, and the patent itself is in the name of Stanford University.
Question, though: if the PR algo is patented, how much or what would they be able to change?
Under US patent law the named inventor must always be a living person, although ownership may be assigned to any entity. This is why Stanford shows as the assignee.
While the PR algo is protected by patent, there is no obligation on the part of a licensee such as google to actually use it. Thus, they can change at any time by simply deciding not to use it.
Geys lets be honest... is PR really that important? I think we have all seen sites with little or no PR and back links do very well in the SERPS. Thats what its about.. the results. All this time freakin out about PR is time wasted. How do you calculate how well youre doing on Yahoo and MSN? theres no PR tool on them and do we live to fight another day....? Pure speculation, but do you think google really wants webmasters knowing if their techniques are really working so it can be abused? If you ask me PR tool was a big mistake, they should have never rolled it out as now it seems to be the main topic in google.. MY PR WHAT HAPPENED I WAS A 4 NOW IM A 3 HOLY *** THE WORLD JUST ENDED. Cmon all lets step back into reality.
[edited by: eelixduppy at 8:01 pm (utc) on Feb. 18, 2009]
pr was and still is a great extra layer of the algo. A rolling adjustment of its calculation is inevitable just like any other algo factor. Its not perfect and can be abused just like all other elements but still remains crucial in what Google does.
|Geys lets be honest... is PR really that important? |
There are a lot of good uses for PR - see msg 120 here [webmasterworld.com] for some.
FYI, this thread is about Toolbar PR so if you want to rave about SERPS/algo this is the wrong thread.
|Why would Google link to directories rather than to the content sites |
I have the same experience and simple explanation.
May be Google now more trusts what others say about you and what you say about others and not what you say about yourself.
I think the PR tool bar is doing exactly what Google intends it to do at this point in time ... NOTHING!
Everyone is spinning this theory and that about PR and the tool bar ... but few have focused on what correlation the tool bar/PR ranking has had in relationship to the SERPS.
My site was a PR5 before the "update", (if that's what this was) ... and its still a PR 5. Some search results have crept up a place or two, some have dropped a place or two. In other words, "situation normal". In fact, I have been unable to pinpoint any meaningful change in the search results so far since last month.
Those sites using hidden text, doorway pages, cloaking, etc. are still doing well, so if there were any new filters being used, I am not seeing any evidence of them being applied in the SERPS.
I've added 31 pages since July. All have been indexed and the majority (except the 4 newest pages added in the last week and a half) have page rank. Sure, there's a short lag for new pages, but that's not news either.
So my question is; just exactly what is there to analyze? Personally, I see nothing "new" or of any significance whatsoever.
|So my question is; just exactly what is there to analyze? Personally, I see nothing "new" or of any significance whatsoever. |
Yep. Our index page seems to have edged up closer to a PR7, judging by deeper pages that are now PR6, but other than the Firefox extension that shows the green bar down at the lower right, nothing seems to have changed. Our serps are the exact same.
What the hell, eh... it wasn't a Dominic/Esmeralda or Florida, just a tweak of the toolbar PR, but judging by many of the accounts in this thread most people did well with the little green bar. Give thanks for small victories.
> Those sites using hidden text, doorway pages, cloaking, etc. are still doing well, so if there were any new filters being used, I am not seeing any evidence of them being applied in the SERPS.
I think this is not true.
1) What we see now is a toolbar update, which shows major changes in the algo and a refresh of the backlink index, which took place around 23rd of september, and indeed the Serps did change those days.
2) I personally know of at least one site which had significant losses since then, and many others reported so in this NG.
as I understood doc_z this thread is a first approach to an analysis of the precise nature of these changes. And because these changes revealed a number of inconsistencies he raised the question whether someone had a problem: Either A) google launching a panicking beta version of a new algo or B) webmasters faced with a now new quality of complexity of the algo and thus perhaps once and for all unable get a glimpse of how it works.
Oliver, I agree.
Liane, please see my message 69 above and the link in it.
Can we get back to Toolbar PR and Google's recent changes to it?
The original post raises some serious and important issues. If anyone wants to explain why visible PR is not important anymore, why analysing PR is a waste of time etc., great, we are stunned at how informed and perceptive you are, now could you please do that elsewhere?
|but few have focused on what correlation the tool bar/PR ranking has had in relationship to the SERPS. |
Because that is exactly what this thread is not about. Please read earlier posts!
PLEASE keep this on topic, people.
As vadim in msg 13 wrote
> the page got PR2. It had no inbound links
I took a quick look after he sent me a mail with the url, and it seems true. So how can a page with absolutely no inbound links get PR in the toolbar? Anyone here with a similar case?
|I think the PR tool bar is doing exactly what Google intends it to do at this point in time ... NOTHING! |
And my point is that you may be right ... OR (what I think is more likely) ... it may be that Google is intentionally ignoring the tool bar whilst they concentrate on new filters and/or new algorithmic calculations of PR, etc.
Once whatever filters or algorithmic changes they may be working on have been successfully put into effect, they will then adjust the toolbar accordingly. I believe it is currently doing nothing ... at least, not accurately.
PR and the SERPS are NOT mutually exclusive. In order to determine the current use and usefullness of the toolbar PR one must at least consider search results, which after all, IS the point of all filter applications and algorithms ... inclusive of PR!
I think it is too early to analyze anything yet as very little has changed. But since I've been asked so graciously ... I'll go away now! :)
>PR and the SERPS are NOT mutually exclusive
Well, if you have PR6 pages on sites that are sandboxed, then I would wonder about this.
Obviously with the 'sandbox', a non-sensical link command and seemingly inconsistent PR from page to page, there is more going on here than just a PR algo change, and it has been in effect for several months.
|If it is the case that every page's PR has the randomness applied to it - but to varying degrees (of a random nature) then do you have a regression curve at all... or just noise? |
Indeed, I was referring to the case of constant standard deviation (error). Of course, there are other types of randomness. However, the values shown in the toolbar doesn’t look purely randomly.
Yes (at least the toolbar PR).
|So my question is; just exactly what is there to analyze? Personally, I see nothing "new" or of any significance whatsoever. |
As already mentioned I’m studying PR propagation for a long time. Some test pages are unchanged for 1 ˝ years. These pages show significant changes. Of course, this doesn’t mean that it’s important for the ranking algorithm. (I never said this.) This is just a discussion about changes in the toolbar PR behaviour. And – in contrast to many other theories – these changes are not pure speculation. If there are any direct consequences is still an open question.
|it may be that Google is intentionally ignoring the tool bar ….Once whatever filters or algorithmic changes they may be working on have been successfully put into effect, they will then adjust the toolbar accordingly. I believe it is currently doing nothing ... at least, not accurately. |
If they would do nothing, the toolbar would show old data. However, the problem is that the data are not only old but doesn’t make sense at all.
Google has played games with PageRank values once before. The time was May or June, 2002. A desktop program called PRmaster had cracked the URL hash for the "phone home" of the toolbar. It was distributed anonymously beginning in December, 2001.
GoogleGuy confirmed [webmasterworld.com] that they tracked down the programmer, had a chat with him, and the program was withdrawn from the download site. (See message 8 in that thread).
By May or June, PRMaster was returning bogus values, plus or minus one or two, as compared to the real Google toolbar. This was Google's hilarious method of extinguishing the last remnants of PRmaster out there on desktops. It had me chasing my tail until some kind person on another forum suggested I download the real toolbar and double-check my work!
Liane, sorry, didn't mean to be rude. Just didn't want the thread going OT.
doc, I agree that values currently showing in the toolbar don't all look random. Is there perhaps a trigger that causes a random PR to show? Some pages have roughly the expected PR. Maybe those are internal pages... or pages that have been around for a while... or pages that have some "x" factor that protects them from the random inteference?
There are actually no special reason to calculate PR all at once as it seems was before.
Of course, there is a reason to calculate PR at once: PR cannot be calculated accurate locally.
I probably should clarify.
1.When PR is calculated traditionally it is *never absolutely accurate* because to be absolutely correct it should have all links to be collected instantly. The set that is used was collected during several month and does not reflect neither real present PR nor even any real instant PR at al. Some links has already changed.
2.If it is so, PR may be as well calculated continuously and not once in a 3 month (after intially all internet was crawled). I.e new data will be added in the process of calculation when calculations are not finished. In fact they will never be finished in this case because there are no strictly defined result (exact values in the strict mathematical sense). However, it does not matter since the course of the error (impossible to get instant picture) and the value of the error will probably the same as in the first case.
Continuous algorithm has nevertheless some advantages.
1.There are no need to switch computers from one task (crawl) to other (PR calculation).
2.Results will always have approximately the same freshness.
3. PR will be changed not for all at once in three month but in a rather unpredictable time and only for individual sites or the group of sites.
It seems what we now begin to observe.
Continuous algorithm has also the advantages that it is more difficult to track for outsiders or to manipulate it.
It may be change in the algorithm. It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.
I’m talking about the PR algorithm not the ranking algorithm.
Strange, you quote me, you see that I wrote “PR” and seems believe that I did not write it. I did. I meant PR.
>It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.
Two different issues here. The fact that links are "good, relevant" is related more to ranking than PR. PR just takes into account the PR value of the link itself.
|Two different issues here. The fact that links are "good, relevant" is related more to ranking than PR. PR just takes into account the PR value of the link itself. |
I know. Simply you assume that there are no changes in the algorithm that Google use. I assume that the changes have happen so it may be a relation between PR and *outbound* links as well PR and relevance. I seemed observed it in my site and a few other people also seem observed it.
All that I would like to say that there is a common sense in such changes if they indeed have happened.
Another VERY strange observation:
I operate a site which is mainly located in the educational sector. It is a site about geography containing country profiles, maps etc.
It has PR7 sometimes it had PR8 with some good links from universities and organisations.
Now very strange things happen with the PR of some pages of the site.
In one sector of my site I describe some International Organisations like Commonwealth, UN etc. So I had a index.html linking to all the pages which each describe one organisation. This 'overview' page got PR0 and so all the organisation-specific sites also got PR0. On the site there only 6 link, each linking to another organisation-page within my site. I can't see a reason for this site to be 0.
The section useful links. I have one html page which contains links to other sites also haveing nice material about countries. Some univerities (harvard, yale, princeton...) some organisations (UN, EU...) and finally to some search engines and to dmoz and looksmart. All in all about 15 links on the page each with description what the linked site is about. There is absolutely zero spam or bad neighbourhood linking on this site, nevertheless it got PR0.
I have a page on my site listing the 20 major cities of the world. Each city is linked to the official governmental city site, so I can't believe that there is a bad neigbourhood. Got PR0.
Surprisingly the site where I list all countries of the world with links to the profiles (~200 links) did not get a PR0.
So why should a site linking to 6 pages with the description of some international organisations get a PR0 while a site linking to 200 country profiles would remain its PR?
|It's also possible that Google now at least partly calculate PR continuously. There are actually no special reason to calculate PR all at once as it seems was before. The point is that any data at a given moment were collected several months so it is not an accurate current snapshot. We may as well use data slowly evolving with time to calculate PR continuously on site by site basis. |
|Continuous algorithm has nevertheless some advantages. |
I always suggested that Google should perform a continuous PR update (see msg #27 and the link within). However, the point is that continuously doesn't mean locally, i.e. even for a continuous update you have to perform the PR calculation on the whole data set. You just start with the old data as initial values and perform less iterations. New pages/links are considered but the calculation is (has to be) done for all pages.
|1.There are no need to switch computers from one task (crawl) to other (PR calculation). |
That's not correct, as explained above. You can't calculate PR locally.
|It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR. |
|Two different issues here. |
|I know. Simply you assume that there are no changes in the algorithm that Google use. |
Of course, Google can change the ranking algorithm and take things like 'authority' into account. However, as already explained by Kirby, that has nothing to do with the PR algorithm which is the subject of this discussion.
> Of course, Google can change the ranking algorithm and take things like 'authority' into account. However, as already explained by Kirby, that has nothing to do with the PR algorithm which is the subject of this discussion.
they might run any algo they want over the PR-table before they present the result of that algo to the toolbar and they also could add additional coefficients to the values before the iteration process. It cant tell anything about the irregularities djgreg reports, but a number of the empirical data in this thread suggests some form of lexical filtering either before or after iterating.
there is an interesting ongoing discussion with important secondary literature-links in
and I wonder whether the "lexical-similarity-coefficient" described in one of those papers might help to cope with the irregularities your analysis-tools have shown.
| This 85 message thread spans 3 pages: < < 85 ( 1 2  ) |