Forum Moderators: open
Modifications of the PR formular (as done by Google in the past) cannot explain the results. Also, even an incomplete update wouldn’t explain all data. Thus a change in the iteration scheme seems to be the most likely explanation.
This behaviour might explain other phenomenons such as the 'Google lag'. Of course, this is pure speculation, but the inconsistencies of the PR calculation (sprictly speaken, the inconsistencies of the PR values shown in the toolbar) are fact.
I got three new pages indexed now all linked to from my main index page and none with any backlinks from anywhere else.
One got PR3 and two PR1. This is inconsistent with anythig we so far heard about calculating PR. Could indicate that the result is not stable yet or that indeed major changes have taken place or that google mocks at us with the toolbar or...
Just to give one empirical datum.
I got three new pages indexed now all linked to from my main index page and none with any backlinks from anywhere else.One got PR3 and two PR1. This is inconsistent with anythig we so far heard about calculating PR.
Similarly, I have two new pages that have links from a main page and no other links. All of the older pages at this same level are PR5, the new pages came in as PR4.
My guess is that Google is now using age of page or age of link as a factor in calculating PageRank.
Of about 20 pages it is very clear that the oldest links (>2 months) were awarded PR3. Middle aged (>1 month) received PR2, and new links (a few weeks), PR1.
I'm certain that these pages are linked by no other means.
Newer pages linked to from a central indexed page have about 1 less page rank than older pages, that's about 1 month versus 2-3 months, no other differences, I noticed that too this morning when I was checking.
<<< so what if google was showing pr that is two / three months old
I'm getting page rank on newer pages than that.
Pages using index.htm?page=23 type urls are getting pagerank 0 currently.
One got PR3 and two PR1. This is inconsistent with anythig we so far heard about calculating PR.
I noticed similar September 18 and seems had clear experiment: the page got PR2. It had no inbound links and all inside site links had PR0.
See my true story at
[webmasterworld.com...]
(message #13)
I believe it was because this page had good relevant *outbound* links. Is your case similar?
It may be change in the algorithm. It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.
After all, PR algorithm was a student work of Google founders and they may change it easily and without a warning.
Also I noticed that my PR for new site changed when most were worried that they had no changes (about a month ago). It may be that for new sites PR is calculated separately using old results of the crawling.
It's also possible that Google now at least partly calculate PR continuously. There are actually no special reason to calculate PR all at once as it seems was before. The point is that any data at a given moment were collected several months so it is not an accurate current snapshot. We may as well use data slowly evolving with time to calculate PR continuously on site by site basis.
Vadim.
Curious thing with one site of mine. On 19 September I added a news report I wrote about a death, and linked to it directly from the PR6 home page of that site. AFAIK it isn't linked to anywhere else. There aren't that many links on that PR6 home page, and some other older pages where they are linked to from just that PR6 home page they have a PR5. HOWEVER, that news report from 19 September has only a PR3. This is way below what should be expected. Looks to me like pages just before the cutoff date only got credited with part of the expected PR.
Added: Correction. I post a link to that news report on another site's message board, and that site also adds the first few paragraphs of all threads started in the news forum on their home page. The link to that page appeared in those first few paragraphs. Thus, that page also has an external link from that other site's home page. Making the PR3 of that page seem even less probable.
[edited by: rfgdxm1 at 3:32 am (utc) on Oct. 8, 2004]
So far, that's my vote. Wild guess is that we're seeing the beginnings of their new approach.
Early, wild speculation (sorry):
--age of pages/links playing a more pronounced role now
--sites are being measured to an extent (not just pages)
--backlink PR may be being modified by qualitative factors.
I know, I know, it's out there...
If you wanted to dampen the value of PR transfered from purchased text links, adding an age factor in would definitely do it. And we can tell from the lag-box a date is being recorded. Is it date of link discovery, date of page discovery or both?
In a different thread, you said
>>(starting with a PR 0) 40 iterations should be good enough. However, I would start with different initial values - taking PR=1 should speed up your calculations.
On pages that I control links and are identical, previously calculated pages have a PR of 5 but new pages have a PR of 4. From above posts, new pages seem to have a lower value for others also.
What if Google decided to do only a few iterations (maybe 10 or so) but started the value at the previous PR value. In most cases of old pages this should get about the same result as starting at 0 or 1 with many more iteratives but newly calculated pages (starting at 0) wouldn't reach their 'full/true' value at first.
Just a thought but makes sense to me that Google might do this and results/facts seem to support something of this nature.
I see a couple advantages,
1) save alot of time (10 iterations not 40, 50 ,100)
2) be more suited for development of a continuous PR calculation vs. an infrequently massive iterations on the whole web.
3) slow the impact of a massive link campaign. (i.e. if a page is new, why did get it suddenly get 1,000 links and stop the came out of nowhere to #1 jumps?spam?)
4) slow the drop to no where of a page if links were lost because a server was down, etc.
5) still reward the consistant but developing sites
and I could go on.
Wild speculation alert, they did a PR update to fit the quarterly schedule, but like every other aspect now, it is pretty screwed up.
I think this is consistent with another thread complaining about a massive drop around sept. 23rd.
Furthermore i found at least one strange example where the toolbar showed PR7 at the complete overview on one thread, dropped to PR4 at September-summary, and moved up tp PR5 showing the single postings. I cannot imagine these lists have inbound-links from anywhere else, so how could this be on the basis of all we so far know about PR?
On a meta-level I begin to doubt whether all this analysis of the algo makes sense at all. We all know that from a mathematical point of view it is impossible to find out about it precisely. The only reason why this pseudo-scientific theory-building, collection of empirical data, veri- and falsification seems to work, is due to the complexity SE-algos have adopted meanwhile.
Perhaps the SE-insiders will become much more cooperative again if we'd move towards a critical analysis of the search results rather than try to find out the best means to spam or move up in ranking (which is basically the same). For instance: As far as my major keywords are concerned, which are not very competitive, I find more and more of these pseudo-directories, dmoz-mirrors and link-farms settling on the first three google pages and I cannot imagine this be very helpful for the user.
The internet has begun to grow faster than moores law, and if this is not already the case it is only a question of time when traditional algos all get their capacity problems. One of the key attempts to cope with this is the discussion on hubs and authorities. We could contribute a lot helping SEs to sort the helpful hubs from the spammers.
Quite agree...One of my 3-months-old sites is supposed to get PR6 for its homepage, but toolbar displays PR5 whereas the link pages which are supposed to get only PR5 got toolbar showing PR6. Very weird!
So I guess its combination of all four - change / inconsistent / incomplete / bug.
And there is another thing which is very interesting:
I have divided the Sitemap of my homepage in several parts. So I have files like sitemap01.php sitemap02.php ...
On these sitemaps I link to all the pages available on my site using the corresponding keywords.
Now these sitemap - pages which basically only consist of 100 links per page rank sometimes better than the pages which are linked from the sitemaps.
Of course the sitemaps are only linked to from my homepage while my subpages have several links from other domains.
So why would these sitemaps, which only consist of links rank better than the content sites?
greg
trimmer80 wroteThe changes to my sites PR are the changes i expected to see two months ago.
graywolf wroteExcept I have blog posts from as late as September 23rd with PR. Back to the drawing board ;-)
I guess here is some confusion about what PR updation one is referring to.
1. The pages of which the TB PR is updated, which is quite evidently as new as around 23rd of Sep.
2. The links that are responsible for this PR update, which of course is not as late as 23rd Sep, but way before that.
I would be interested to see, if someone has any idea or evidence, what is the cut-off time of the links that are accounted for this PR update. One of my sites that received a PR7 link on Sep 15th, has actually dropped from PR5 to 4, so it must be before that.
It may be change in the algorithm. It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.
No, PR is based on inbound links, the PR of a page is not affected at all by outbound links.
After all, PR algorithm was a student work of Google founders and they may change it easily and without a warning.
The PR algo is a patented work. It was granted a patent by the U.S. Patent Office, and the patent itself is in the name of Stanford University.
It would seem there is a time factor, a puzzling one. I've got sites only a couple of months old showing PR on all pages, and pages on an older site that were added and cached with an Aug. 21 date that I put on the pages have PR showing - but for an older established site with a mediocre PR5 that got a PR7 and a PR6 link around the same time those new sites' pages went up, if not before, PR hasn't changed a bit.
Question, though: if the PR algo is patented, how much or what would they be able to change?
Do you see inconsistencies in toolbar PR with older pages with older links?
Yes, the pages (and the linking structure) are at least six months old - some are unchanged for more than a year.
what inconsistencies do you see?
There are several types of inconsistencies. Two examples (the are even more):
- an increase of PR where a decease should occur (this is a fundamental problem)
- two identical structures which must have the same PR (same linking structure and identical incoming PR) have different PR
so what if google was showing pr that is two / three months old...
This wouldn’t explain the behaviour described above.
My guess is that Google is now using age of page or age of link as a factor in calculating PageRank.
Talking the age for PR calculation (!) into account wouldn’t make sense (and it would be complicate and time consuming to get a valid model) and wouldn’t explain my data (at least there must be also other changes).
These newer (but June/July) pages seem to have lower PR than parallel pages...
all pages linked from the homepage gained PR for the first time and range from PR1 - PR3.
Newer pages linked to from a central indexed page have about 1 less page rank than older pages
HOWEVER, that news report from 19 September has only a PR3. This is way below what should be expected.
I had some older stuff that should have gone to a PR 6 or 7 that stayed constant at a 5. However I had some new stuff that that ended up right on target at a PR4.
As said, a age factor can’t explain all my data. However, I’m also seeing this effect for several pages/sites!
I would guess that this isn’t a results from additional factors introduced into the PR formular but a side effect resulting from the new kind of PR calculation, i.e. they didn’t changed the PR formula but the calculation (iteration scheme) which leads to this (unwanted?) effect.
There are actually no special reason to calculate PR all at once as it seems was before.
Of course, there is a reason to calculate PR at once: PR cannot be calculated accurate locally.
What if Google decided to do only a few iterations (maybe 10 or so) but started the value at the previous PR value.
Yes, starting with the values of the last calculation makes sense and speeds up the calculation (we discussed this before, e.g. here [webmasterworld.com]). And perhaps this is part of the explanation. However, in most a the cases people described here even one iteration should give a result (for new pages which are directly linked from an old page) which is close to the exact value (especially if we are talking about the toolbar, because this is a logarithmic scale). More iterations are mainly necessary for deeper inner pages (propagation).
It may be change in the algorithm. It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.
I’m talking about the PR algorithm not the ranking algorithm.
After all, PR algorithm was a student work of Google founders and they may change it easily and without a warning.
They already made changes long time ago. However, there are some general principles which shouldn’t be violated even if details where changed (see the examples given at the beginning).
I created 3 additional pages in july linked from a central internal cat index. (previously five pages in all)
Of the three new pages, 2 pages have PR3 in line with the other five pages, but the other page only has PR2.
All 3 pages were uploaded on the same day.
Damn you got there before me and so much more eloquent Doc_z
[edited by: tantalus at 10:03 am (utc) on Oct. 8, 2004]
1. A change in the log scale
2. Some downgrading of the value of home pages (with the PR coming to home pages/entry pages distributed to internal pages instead)
3. By association PR is not purely a page issue anymore, G is taking "sites" into account in some way when allocating/distributing PR.
4. A devaluation of DMOZ/Google Directory listings in the calculation of PR (perhaps something to do with the million of DMOZ clones that have sprung up)
5. Some value or penalty for freshness - or lack of it in some categories
6. Devaluing links from some sources - like blogs
7. Devaluing of links from blatant PR sellers (either via a manual block ... or some new way they've found of detecting the blatant PR sellers via algo)
And one "pure" conspiracy for good measure....
8. Throwing in a random obfuscation factor in Toolbar PR that randomly affects some pages and not others (to confuse SEOs)
</speculation>
[edited by: Macro at 10:10 am (utc) on Oct. 8, 2004]