homepage Welcome to WebmasterWorld Guest from 54.205.144.54
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 85 message thread spans 3 pages: 85 ( [1] 2 3 > >     
Google’s PR algorithm and/or iteration scheme changed
… and it seems that they have a problem
doc_z




msg:214855
 1:24 pm on Oct 7, 2004 (gmt 0)

I’m studying Google’s PR algorithm in detail for a long time, i.e. I’m building test pages to analyse changes in the PR algorithm as well as the damping factor, the logarithmic base and so on. So far I always got consistent, reasonable results. However, the current PR update is quite different. The results (several tests on different domains) show inconsistencies.

Possible explanations are:
- This isn’t (yet) a complete update; partly old data are used
- Google changed their PR calculation scheme. It seems that they used the Jacobian iteration scheme (mentioned in the original papers) in the past. They might change this scheme. However, they either didn’t perform enough iterations or there is a bug.

Modifications of the PR formular (as done by Google in the past) cannot explain the results. Also, even an incomplete update wouldn’t explain all data. Thus a change in the iteration scheme seems to be the most likely explanation.

This behaviour might explain other phenomenons such as the 'Google lag'. Of course, this is pure speculation, but the inconsistencies of the PR calculation (sprictly speaken, the inconsistencies of the PR values shown in the toolbar) are fact.

 

vitaplease




msg:214856
 8:22 pm on Oct 7, 2004 (gmt 0)

Do you see inconsistencies in toolbar PR with older pages with older links?
That is no new links in months, all old links from stable origin?

trimmer80




msg:214857
 8:42 pm on Oct 7, 2004 (gmt 0)

<wild random theory>
hmmmmm, so what if google was showing pr that is two / three months old... just to prevent us exploiting it. And from now, each month they will update the toolbar with values that are from the two months ago. The changes to my sites PR are the changes i expected to see two months ago.
</wild random theory>

graywolf




msg:214858
 9:07 pm on Oct 7, 2004 (gmt 0)

The changes to my sites PR are the changes i expected to see two months ago.

Except I have blog posts from as late as September 23rd with PR. Back to the drawing board ;-)

Oliver Henniges




msg:214859
 9:33 pm on Oct 7, 2004 (gmt 0)

As I posted in the other thread:

I got three new pages indexed now all linked to from my main index page and none with any backlinks from anywhere else.

One got PR3 and two PR1. This is inconsistent with anythig we so far heard about calculating PR. Could indicate that the result is not stable yet or that indeed major changes have taken place or that google mocks at us with the toolbar or...

Just to give one empirical datum.

CathyM




msg:214860
 10:09 pm on Oct 7, 2004 (gmt 0)

I got three new pages indexed now all linked to from my main index page and none with any backlinks from anywhere else.

One got PR3 and two PR1. This is inconsistent with anythig we so far heard about calculating PR.

Similarly, I have two new pages that have links from a main page and no other links. All of the older pages at this same level are PR5, the new pages came in as PR4.

My guess is that Google is now using age of page or age of link as a factor in calculating PageRank.

steveb




msg:214861
 10:09 pm on Oct 7, 2004 (gmt 0)

I don't have many tests in place, but I do see two pages page that have just flat out the wrong PR. These newer (but June/July) pages seem to have lower PR than parallel pages... meaning a page from last year might be PR6, while one created with the exact same links in July is only PR5.

celenoid




msg:214862
 10:52 pm on Oct 7, 2004 (gmt 0)

I tend to agree there is an age factor involved. On a relatively new site, all pages linked from the homepage gained PR for the first time and range from PR1 - PR3.

Of about 20 pages it is very clear that the oldest links (>2 months) were awarded PR3. Middle aged (>1 month) received PR2, and new links (a few weeks), PR1.

I'm certain that these pages are linked by no other means.

isitreal




msg:214863
 11:54 pm on Oct 7, 2004 (gmt 0)

I'm seeing the age factor too.

Newer pages linked to from a central indexed page have about 1 less page rank than older pages, that's about 1 month versus 2-3 months, no other differences, I noticed that too this morning when I was checking.

<<< so what if google was showing pr that is two / three months old

I'm getting page rank on newer pages than that.

Pages using index.htm?page=23 type urls are getting pagerank 0 currently.

mfishy




msg:214864
 1:27 am on Oct 8, 2004 (gmt 0)

We are noticing a change in PR calculation as well. Have a couple sites that just hang around for tests and there is no consistency.

Either the TBPR is really screwed (intentionally or unintentionally) or there have been significant changes to the PR algorithm.

giggle




msg:214865
 1:56 am on Oct 8, 2004 (gmt 0)

I recently (1 month ago) created a sub domain to our main web site.

Last night the main site dropped from PR5 to PR4 and the sub domain got a PR3.

Vadim




msg:214866
 2:33 am on Oct 8, 2004 (gmt 0)

One got PR3 and two PR1. This is inconsistent with anythig we so far heard about calculating PR.

I noticed similar September 18 and seems had clear experiment: the page got PR2. It had no inbound links and all inside site links had PR0.

See my true story at
[webmasterworld.com...]
(message #13)

I believe it was because this page had good relevant *outbound* links. Is your case similar?

It may be change in the algorithm. It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.

After all, PR algorithm was a student work of Google founders and they may change it easily and without a warning.

Also I noticed that my PR for new site changed when most were worried that they had no changes (about a month ago). It may be that for new sites PR is calculated separately using old results of the crawling.

It's also possible that Google now at least partly calculate PR continuously. There are actually no special reason to calculate PR all at once as it seems was before. The point is that any data at a given moment were collected several months so it is not an accurate current snapshot. We may as well use data slowly evolving with time to calculate PR continuously on site by site basis.

Vadim.

rfgdxm1




msg:214867
 3:19 am on Oct 8, 2004 (gmt 0)

>Except I have blog posts from as late as September 23rd with PR. Back to the drawing board ;-)

Curious thing with one site of mine. On 19 September I added a news report I wrote about a death, and linked to it directly from the PR6 home page of that site. AFAIK it isn't linked to anywhere else. There aren't that many links on that PR6 home page, and some other older pages where they are linked to from just that PR6 home page they have a PR5. HOWEVER, that news report from 19 September has only a PR3. This is way below what should be expected. Looks to me like pages just before the cutoff date only got credited with part of the expected PR.

Added: Correction. I post a link to that news report on another site's message board, and that site also adds the first few paragraphs of all threads started in the news forum on their home page. The link to that page appeared in those first few paragraphs. Thus, that page also has an external link from that other site's home page. Making the PR3 of that page seem even less probable.

[edited by: rfgdxm1 at 3:32 am (utc) on Oct. 8, 2004]

caveman




msg:214868
 3:19 am on Oct 8, 2004 (gmt 0)

>or there have been significant changes to the PR algorithm

So far, that's my vote. Wild guess is that we're seeing the beginnings of their new approach.

Early, wild speculation (sorry):
--age of pages/links playing a more pronounced role now
--sites are being measured to an extent (not just pages)
--backlink PR may be being modified by qualitative factors.

I know, I know, it's out there...

graywolf




msg:214869
 4:12 am on Oct 8, 2004 (gmt 0)

I had some older stuff that should have gone to a PR 6 or 7 that stayed constant at a 5. However I had some new stuff that that ended up right on target at a PR4.

If you wanted to dampen the value of PR transfered from purchased text links, adding an age factor in would definitely do it. And we can tell from the lag-box a date is being recorded. Is it date of link discovery, date of page discovery or both?

LogicMan




msg:214870
 4:12 am on Oct 8, 2004 (gmt 0)

>>doc_Z states
>>Possible explanations are .... Google changed their PR calculation scheme. It seems that they used the Jacobian iteration scheme (mentioned in the original papers) in the past. They might change this scheme. However, they either didn’t perform enough iterations .....

In a different thread, you said
>>(starting with a PR 0) 40 iterations should be good enough. However, I would start with different initial values - taking PR=1 should speed up your calculations.

On pages that I control links and are identical, previously calculated pages have a PR of 5 but new pages have a PR of 4. From above posts, new pages seem to have a lower value for others also.

What if Google decided to do only a few iterations (maybe 10 or so) but started the value at the previous PR value. In most cases of old pages this should get about the same result as starting at 0 or 1 with many more iteratives but newly calculated pages (starting at 0) wouldn't reach their 'full/true' value at first.

Just a thought but makes sense to me that Google might do this and results/facts seem to support something of this nature.

I see a couple advantages,
1) save alot of time (10 iterations not 40, 50 ,100)
2) be more suited for development of a continuous PR calculation vs. an infrequently massive iterations on the whole web.
3) slow the impact of a massive link campaign. (i.e. if a page is new, why did get it suddenly get 1,000 links and stop the came out of nowhere to #1 jumps?spam?)
4) slow the drop to no where of a page if links were lost because a server was down, etc.
5) still reward the consistant but developing sites

and I could go on.

Kirby




msg:214871
 5:25 am on Oct 8, 2004 (gmt 0)

I have 50+ pages in what is basically a closed loop system. They all link off of one PR5 page, then they all link to each other. There are no other incoming links. 2/3 were built late June, 1/3 added in August. All showed PR0 prior to the update. The group added in June are PR3, the last group added in August are PR4. This makes no sense to me.

steveb




msg:214872
 6:26 am on Oct 8, 2004 (gmt 0)

I took a page offline on July 1st, and removed all links to it. That URL still shows a PR4. The page was a PR6 previously. Every page that linked to this page has been crawled many time subsequently, so Google should know that there are no links to this page.

Wild speculation alert, they did a PR update to fit the quarterly schedule, but like every other aspect now, it is pretty screwed up.

Oliver Henniges




msg:214873
 6:59 am on Oct 8, 2004 (gmt 0)

A very helpful empirical databasis on analyzing the behaviour of that new (modified? beta-?)algo is the w3c mailing lists. I clicked thru some sporadically and found out that the update actually must have taken place soomewhere between sep, 22nd -27th. All posts before 22nd are addressed PR now and all after 27th have not.

I think this is consistent with another thread complaining about a massive drop around sept. 23rd.

Furthermore i found at least one strange example where the toolbar showed PR7 at the complete overview on one thread, dropped to PR4 at September-summary, and moved up tp PR5 showing the single postings. I cannot imagine these lists have inbound-links from anywhere else, so how could this be on the basis of all we so far know about PR?

On a meta-level I begin to doubt whether all this analysis of the algo makes sense at all. We all know that from a mathematical point of view it is impossible to find out about it precisely. The only reason why this pseudo-scientific theory-building, collection of empirical data, veri- and falsification seems to work, is due to the complexity SE-algos have adopted meanwhile.

Perhaps the SE-insiders will become much more cooperative again if we'd move towards a critical analysis of the search results rather than try to find out the best means to spam or move up in ranking (which is basically the same). For instance: As far as my major keywords are concerned, which are not very competitive, I find more and more of these pseudo-directories, dmoz-mirrors and link-farms settling on the first three google pages and I cannot imagine this be very helpful for the user.

The internet has begun to grow faster than moores law, and if this is not already the case it is only a question of time when traditional algos all get their capacity problems. One of the key attempts to cope with this is the discussion on hubs and authorities. We could contribute a lot helping SEs to sort the helpful hubs from the spammers.

sit2510




msg:214874
 7:15 am on Oct 8, 2004 (gmt 0)

>>> Google’s PR algorithm and/or iteration scheme changed
… and it seems that they have a problem

Quite agree...One of my 3-months-old sites is supposed to get PR6 for its homepage, but toolbar displays PR5 whereas the link pages which are supposed to get only PR5 got toolbar showing PR6. Very weird!

So I guess its combination of all four - change / inconsistent / incomplete / bug.

djgreg




msg:214875
 7:30 am on Oct 8, 2004 (gmt 0)

Oliver Henniges:
Very true. I also see a lot of directories in the serps.
Even my rankings have been substituted by directories which link to me. Wow, what a great success. Don't show the user the site where the content is, show him a directory which links to content.
I can't imagine that anybody can see a sense in that.
Why would Google link to directories rather than to the content sites.

And there is another thing which is very interesting:
I have divided the Sitemap of my homepage in several parts. So I have files like sitemap01.php sitemap02.php ...
On these sitemaps I link to all the pages available on my site using the corresponding keywords.
Now these sitemap - pages which basically only consist of 100 links per page rank sometimes better than the pages which are linked from the sitemaps.
Of course the sitemaps are only linked to from my homepage while my subpages have several links from other domains.
So why would these sitemaps, which only consist of links rank better than the content sites?

greg

McMohan




msg:214876
 7:51 am on Oct 8, 2004 (gmt 0)

trimmer80 wrote
The changes to my sites PR are the changes i expected to see two months ago.

graywolf wrote
Except I have blog posts from as late as September 23rd with PR. Back to the drawing board ;-)

I guess here is some confusion about what PR updation one is referring to.

1. The pages of which the TB PR is updated, which is quite evidently as new as around 23rd of Sep.

2. The links that are responsible for this PR update, which of course is not as late as 23rd Sep, but way before that.

I would be interested to see, if someone has any idea or evidence, what is the cut-off time of the links that are accounted for this PR update. One of my sites that received a PR7 link on Sep 15th, has actually dropped from PR5 to 4, so it must be before that.

Marcia




msg:214877
 8:01 am on Oct 8, 2004 (gmt 0)

It may be change in the algorithm. It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.

No, PR is based on inbound links, the PR of a page is not affected at all by outbound links.

After all, PR algorithm was a student work of Google founders and they may change it easily and without a warning.

The PR algo is a patented work. It was granted a patent by the U.S. Patent Office, and the patent itself is in the name of Stanford University.

It would seem there is a time factor, a puzzling one. I've got sites only a couple of months old showing PR on all pages, and pages on an older site that were added and cached with an Aug. 21 date that I put on the pages have PR showing - but for an older established site with a mediocre PR5 that got a PR7 and a PR6 link around the same time those new sites' pages went up, if not before, PR hasn't changed a bit.

Question, though: if the PR algo is patented, how much or what would they be able to change?

getvisibleuk




msg:214878
 8:23 am on Oct 8, 2004 (gmt 0)

"topic sensative" anyone?

kaled




msg:214879
 9:49 am on Oct 8, 2004 (gmt 0)

If Google have changed the algo, that could explain the lack of PR update over recent months. Perhaps we'll see regular updates again and inconsistencies will gradually diminish.

Kaled.

doc_z




msg:214880
 9:57 am on Oct 8, 2004 (gmt 0)

Do you see inconsistencies in toolbar PR with older pages with older links?

Yes, the pages (and the linking structure) are at least six months old - some are unchanged for more than a year.

what inconsistencies do you see?

There are several types of inconsistencies. Two examples (the are even more):
- an increase of PR where a decease should occur (this is a fundamental problem)
- two identical structures which must have the same PR (same linking structure and identical incoming PR) have different PR

so what if google was showing pr that is two / three months old...

This wouldn’t explain the behaviour described above.

My guess is that Google is now using age of page or age of link as a factor in calculating PageRank.

Talking the age for PR calculation (!) into account wouldn’t make sense (and it would be complicate and time consuming to get a valid model) and wouldn’t explain my data (at least there must be also other changes).

These newer (but June/July) pages seem to have lower PR than parallel pages...

all pages linked from the homepage gained PR for the first time and range from PR1 - PR3.

Newer pages linked to from a central indexed page have about 1 less page rank than older pages

HOWEVER, that news report from 19 September has only a PR3. This is way below what should be expected.

I had some older stuff that should have gone to a PR 6 or 7 that stayed constant at a 5. However I had some new stuff that that ended up right on target at a PR4.

As said, a age factor can’t explain all my data. However, I’m also seeing this effect for several pages/sites!

I would guess that this isn’t a results from additional factors introduced into the PR formular but a side effect resulting from the new kind of PR calculation, i.e. they didn’t changed the PR formula but the calculation (iteration scheme) which leads to this (unwanted?) effect.

There are actually no special reason to calculate PR all at once as it seems was before.

Of course, there is a reason to calculate PR at once: PR cannot be calculated accurate locally.

What if Google decided to do only a few iterations (maybe 10 or so) but started the value at the previous PR value.

Yes, starting with the values of the last calculation makes sense and speeds up the calculation (we discussed this before, e.g. here [webmasterworld.com]). And perhaps this is part of the explanation. However, in most a the cases people described here even one iteration should give a result (for new pages which are directly linked from an old page) which is close to the exact value (especially if we are talking about the toolbar, because this is a logarithmic scale). More iterations are mainly necessary for deeper inner pages (propagation).

It may be change in the algorithm. It looks reasonable that a collection of good relevant outbound links should have some authority, i.e. a little PR.

I’m talking about the PR algorithm not the ranking algorithm.

After all, PR algorithm was a student work of Google founders and they may change it easily and without a warning.

They already made changes long time ago. However, there are some general principles which shouldn’t be violated even if details where changed (see the examples given at the beginning).

tantalus




msg:214881
 9:59 am on Oct 8, 2004 (gmt 0)

I'm not too sure about the time factor.

I created 3 additional pages in july linked from a central internal cat index. (previously five pages in all)

Of the three new pages, 2 pages have PR3 in line with the other five pages, but the other page only has PR2.

All 3 pages were uploaded on the same day.

Damn you got there before me and so much more eloquent Doc_z

[edited by: tantalus at 10:03 am (utc) on Oct. 8, 2004]

Macro




msg:214882
 10:02 am on Oct 8, 2004 (gmt 0)

<speculation>

1. A change in the log scale

2. Some downgrading of the value of home pages (with the PR coming to home pages/entry pages distributed to internal pages instead)

3. By association PR is not purely a page issue anymore, G is taking "sites" into account in some way when allocating/distributing PR.

4. A devaluation of DMOZ/Google Directory listings in the calculation of PR (perhaps something to do with the million of DMOZ clones that have sprung up)

5. Some value or penalty for freshness - or lack of it in some categories

6. Devaluing links from some sources - like blogs

7. Devaluing of links from blatant PR sellers (either via a manual block ... or some new way they've found of detecting the blatant PR sellers via algo)

And one "pure" conspiracy for good measure....

8. Throwing in a random obfuscation factor in Toolbar PR that randomly affects some pages and not others (to confuse SEOs)

</speculation>

[edited by: Macro at 10:10 am (utc) on Oct. 8, 2004]

tantalus




msg:214883
 10:09 am on Oct 8, 2004 (gmt 0)

8. Seeems a likely prospect

Marval




msg:214884
 10:15 am on Oct 8, 2004 (gmt 0)

doc_Z - just wondering what you are using as data for the PR? Are you using the PR display in the directory or the toolbar? I would venture a guess that the toolbar PR has nothing to do whatsoever with a pages real PR and is showing some display similar to or linked to the new backlinks display, wheras the PR shown in the directory, which was updated a few weeks ago is displaying the true PR. If you take a look at the directory PR for a page, it still makes sense in the scheme of ranking in the SERPs, wheras toolbar PR looks like it is being used to confuse people trying to buy and sell links

This 85 message thread spans 3 pages: 85 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved