homepage Welcome to WebmasterWorld Guest from 54.161.133.166
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 85 message thread spans 3 pages: < < 85 ( 1 [2] 3 > >     
Google’s PR algorithm and/or iteration scheme changed
… and it seems that they have a problem
doc_z




msg:214855
 1:24 pm on Oct 7, 2004 (gmt 0)

I’m studying Google’s PR algorithm in detail for a long time, i.e. I’m building test pages to analyse changes in the PR algorithm as well as the damping factor, the logarithmic base and so on. So far I always got consistent, reasonable results. However, the current PR update is quite different. The results (several tests on different domains) show inconsistencies.

Possible explanations are:
- This isn’t (yet) a complete update; partly old data are used
- Google changed their PR calculation scheme. It seems that they used the Jacobian iteration scheme (mentioned in the original papers) in the past. They might change this scheme. However, they either didn’t perform enough iterations or there is a bug.

Modifications of the PR formular (as done by Google in the past) cannot explain the results. Also, even an incomplete update wouldn’t explain all data. Thus a change in the iteration scheme seems to be the most likely explanation.

This behaviour might explain other phenomenons such as the 'Google lag'. Of course, this is pure speculation, but the inconsistencies of the PR calculation (sprictly speaken, the inconsistencies of the PR values shown in the toolbar) are fact.

 

Oliver Henniges




msg:214885
 10:24 am on Oct 8, 2004 (gmt 0)

> Is your case similar?
Vadim send me a sticky mail and I'll tell you the URLs if you want, so you may have a look at your own.

DJGreg:
> Why would Google link to directories rather than to the content sites.

Because this new algo might be a first (?) try to cope with the "authorities"-stuff. Note that the dmoz seems to have completely vanished from the backlink lists. Seems to me the authorities will be hidden from the search results in the future, and the algo is far from perfect in deciding whether a linklist is an authority or spam. I think many of these lists got PR0 now and will be thrown out completely the next update.

> So why would these sitemaps, which only consist of links rank better than the content sites?

Because they still - according to the old thumb-rule -inherit a PR-1 form your index-page, "send" a PR-2 to your content pages and the external inbound links of the latter do not suffice to switch this up to or beyond PR-1 again.

> "topic sensative" anyone?

I see no post off topic:

>>> Google’s PR algorithm and/or iteration scheme changed … and it seems that they have a problem

Faced with obvious massive changes this is a naturally chaotic period of collecting empirical data and formulating first theorems. The goal will be to either find more precise coefficients to mirror the behaviour of the PR-calculation-algorithm or to find new fields of activity for all then unemployed SEO-experts *g*.

Hanu




msg:214886
 10:28 am on Oct 8, 2004 (gmt 0)

If Google have changed the algo, that could explain the lack of PR update over recent months. Perhaps we'll see regular updates again and inconsistencies will gradually diminish.

What you say makes sense, kaled. There already is enough complication and obfuscation in the serps algo. There is no need for Google to obfuscate or complicate PR calculation as it's just one factor in many that affect the serps. PR as such is a pretty straight-forward concept. What they could have changed are

1) the Coefficients (dampening factor, logarithmic base of toolbar PR) or

2) the Computational algorithm. The PR formula for one page is a linear equation. The PR formulas of all pages combined is a giant system of linear equation. It takes quite a while to solve this system. I'm not good at math but what if they changed the PR calculation to something inkremental that allows for almost instantaeneous PR update?

steveb




msg:214887
 10:28 am on Oct 8, 2004 (gmt 0)

If intentional changes were made, it seems a certainty that several were made, with non-similar effects.

It also seems though that some of the discrepencies are so completely unearthly that the *most* logical explaination would be "screw up".

I see zero evidence of topical sensitivity or any consistently different dampening within a domain. Pages that are exactly as relevant as peer pages with the exact same linking sometimes show inexplicable differences. A mathematical equation does not allow some of these things to exist. Maybe a bollocksed search engine does though...

<but just to reiterate again, I only have a dozen examples, but all could be explained by an age-of-link reason. This is the single they they have in common.>

[edited by: steveb at 10:38 am (utc) on Oct. 8, 2004]

Marcia




msg:214888
 10:35 am on Oct 8, 2004 (gmt 0)

I think the topic reference posted isn't referring to whether posts are on topic (which they are), but rather very quietly injecting an inquiry as to whether or not anyone might think Topic Sensitive Page Rank is coming of age.

Interesting thought. In view of the patent issued not too long ago, some are wondering whether there might be intermediate iterations based on Local Rank, which could narrow things down topically. Even as far back as Florida, there were sites that were seriously filtered, not having any inbounds from topically related sites or pages.

Hanu




msg:214889
 10:48 am on Oct 8, 2004 (gmt 0)

Topic Sensitive Page Rank

I don't understand. What topic? The topic that G thinks a page is about? That would be too radical, wouldn't it?

matt21811




msg:214890
 11:00 am on Oct 8, 2004 (gmt 0)

-I don't understand. What topic? The topic that G thinks a page is about? That would be too radical, wouldn't it?

My own experience has conviced me that on topic backlinks confer a serious benefit now.
This is a change from the past but I wouldnt call it radical. Actually, I think its fairly evolutionary.
Back links purchased from off topic but high PR sites become worth a lot less. A great way to cool the link market.

steveb




msg:214891
 11:02 am on Oct 8, 2004 (gmt 0)

Just looked at a site that should be a protype for lag time: one devoted to the presidential electoral vote. This site got a PR3 in June after it had been online a month or so. Jillions of places link to it. In the beginning of September it got PR7 in the Directory. Right now it shows:
www.site.com/ PR3
site.com/ PR6
Internal pages (most PR6 and PR5) link to www.site.com/index.html (which shows PR3)
The site asks people to link to www.site.com/
www.site.com has twenty times the backlinks that site.com has

It is mathematically impossible for www.site.com/ to have a PR3, even if it is possible for site.com/ to have its own PR6. (The genuine situation should be www.site.com has PR7, while site.com has a PR6.)

This is a textbook new/lagged domain that got tons of both high quality and low quality, and both high PR and low PR, links to it.

It would seem hard to not conclude that the main page, but not inner pages, of this domain got a PR devaluation because either the links were new or (very unlikely) that it somehow got way too many links from way too many sources.

Macro




msg:214892
 11:05 am on Oct 8, 2004 (gmt 0)

steveb, I'm seeng exactly that. Which is why I made point 2 in msg #29.

steveb




msg:214893
 11:12 am on Oct 8, 2004 (gmt 0)

So far as this one site is concerned though, the "instead" in your point 2 doesn't apply. The subpages PR isn't boosted, they are just simply showing the PR they would show if the main page was showing its "correct" PR.

Macro




msg:214894
 11:17 am on Oct 8, 2004 (gmt 0)

>> they are just simply showing the PR they would show if the main page was showing its "correct" PR.

They may not display an increased PR if they went from 5.1 to 5.9

Hanu




msg:214895
 11:41 am on Oct 8, 2004 (gmt 0)

My own experience has conviced me that on topic backlinks confer a serious benefit now.

Yes, but what topic? Google can't guess the topic of a page out of the blue, can it? PR is a simple equation that takes links into account and nothing else. It would be easy to filter the links that go into the equation by say time in order to have PR indicate sandbox. But it is highly unlikely that something blury like topic is considered for PR.

The topic comes into play is when the user specifies it by entering a search query. Then, on-topic links count in the form of LocalRank and anchor text. But again, on-topic PR is out of the question.

HarryM




msg:214896
 11:58 am on Oct 8, 2004 (gmt 0)

A couple of posts have mentioned the possibility of a change to the log scale, and from all I have read here this is the most likely hypothesis to explain my own results. Backlinks have been increasing, pages increasing, traffic increasing, but toolbar PR has dropped a point on almost all pages. From all the evidence, I had been expecting it to increase. (Silly me...)

This would seem to make sense as the number of sites is growing at a phenomenal rate, and (probably) so too is their quality. Making it more difficult to climb the PR ladder might be Google's answer to this problem.

PR does have a real effect in so much it appears to influence the amount a site is crawled. For example, by pushing down lower end PR5 sites into PR4 would be a significant saving to Google.

graywolf




msg:214897
 11:59 am on Oct 8, 2004 (gmt 0)

Yes, but what topic? Google can't guess the topic of a page out of the blue, can it?

If you have an adsense account try pointing it at one of your pages and see if Google can figure out what the page is about. If not try one of the adsense preview tools. It may take a day or two but it will get there.

Hanu




msg:214898
 1:02 pm on Oct 8, 2004 (gmt 0)

Hmmh, did that. So G can estimate what your pages is about. It still seems extremely unlikely that PageRank is coupled with these topic guesses. There are pages with more than one topic. If topic were considered for PR, would empty pages get PR? The concept topic is something way too fuzzy to be considered for something as plain mathematical as PR. I still think that if there is something weird with this PR update, it is either a change in coefficients (logarithmic base or dampening factor) or a time-based filtering of links.

isitreal




msg:214899
 1:06 pm on Oct 8, 2004 (gmt 0)

<<< The concept topic is something way too fuzzy to be considered for something as plain mathematical as PR

I'd say statisticaly, for all practical purposes, google is very safe in assuming that each page has one topic. This doesn't mean that each page only has one topic, but on average it does, high above average would be my guess for ranking sites. I just checked using the brett WebmasterWorld theme tool, it got a site I often compare things to themed correctly.

rfgdxm1




msg:214900
 1:23 pm on Oct 8, 2004 (gmt 0)

>Question, though: if the PR algo is patented, how much or what would they be able to change?

That they patented it wouldn't stop them from changing it, or quit using it altogether if they wanted.

rfgdxm1




msg:214901
 1:28 pm on Oct 8, 2004 (gmt 0)

>A couple of posts have mentioned the possibility of a change to the log scale, and from all I have read here this is the most likely hypothesis to explain my own results. Backlinks have been increasing, pages increasing, traffic increasing, but toolbar PR has dropped a point on almost all pages. From all the evidence, I had been expecting it to increase. (Silly me...)

It could also just be that the highest PR page out there has increased absolute PR more than yours. With a true log scale, in this case your absolute PR could increase, yet toolbar PR decrease.

doc_z




msg:214902
 1:36 pm on Oct 8, 2004 (gmt 0)

If Google have changed the algo, that could explain the lack of PR update over recent months. Perhaps we'll see regular updates again and inconsistencies will gradually diminish.

Hopefully you’re right.

It is mathematically impossible for www.site.com/ to have a PR3, even if it is possible for site.com/ to have its own PR6.

That the kind of 'inconsistencies' I'm talking about. These things cannot be explained with a change in the logarithmic scale, a change of the damping factor, a devaluation of some kind of links or something similar.

Also modifications of the PR formula are not so easy as it looks like. You have to ensure that you’re ending with a mathematical valid model.

speculation: 1. …8.

As explained above, normal modifications couldn't explain the data. Even a random factor (multiplicative or additive) for PR value shown in the toolbar wouldn’t be consistent with all my measurements.

A random PR source for PR calculation might be a solution.

Are you using the PR display in the directory or the toolbar? I would venture a guess that the toolbar PR has nothing to do whatsoever with a pages real PR and is showing some display similar to or linked to the new backlinks display,

I'm using the toolbar PR value. In the past these values were related to the real PR value in a logarithmic way.

zeus




msg:214903
 1:36 pm on Oct 8, 2004 (gmt 0)

Hmm I think Google seach looks good today and it is quick, so they must have finished something that uses some power.

graywolf




msg:214904
 1:43 pm on Oct 8, 2004 (gmt 0)

The concept topic is something way too fuzzy to be considered for something as plain mathematical as PR.

Lets say your page is about fuzzy pink sock puppets. Using the same technology used for adwords keyword suggestion tools it's not that hard to come up with a list of related keywords for the topic of your page. So links out to other fuzzy pink sock puppets pages could get the highest percentage of PR transfered. Links to puppets, socks , or pink could get a slightly lower percentage transfer. Links to blue widgets would only get a small benefit.

Pass the Dutchie




msg:214905
 1:48 pm on Oct 8, 2004 (gmt 0)

DJGreg:
> Why would Google link to directories rather than to the content sites.

Yep seen this in abundance in my SERPs, directories are all over the shop. My theory is that most directories have sponsored links provided by PPC engines (including Google's adwords). It would be unethical if Google only boosted the dominance of directories that carried Adwords so they just boosted the prominence of all. Great for the exposure and promotion of PPC culture. Google won't plaster their own SERP results with Adwords but they might get away with it if the top listings in the SERP’s were from directories containing sponsored results.

Personally I was hoping to see a reduction of directories cluttering up the SERP’s

Pass the Dutchie




msg:214906
 1:50 pm on Oct 8, 2004 (gmt 0)

re: topic

If a site were in multiple languages would each of the various languages be off topic for the same information!

randle




msg:214907
 1:56 pm on Oct 8, 2004 (gmt 0)

Macro,

I think you throw up a good possibility.

"8. Throwing in a random obfuscation factor in Toolbar PR that randomly affects some pages and not others (to confuse SEOs)"

First they confused the heck out of us by displaying disinformation with the backlinks, and now maybe they have instituted a program of disinformation with the tool bar PR display. If the disinformation factor is different for different people, makes collective efforts like this board Googles best friend. Everyone comparing data, observations, anecdotes trying to collectively figure this out and its all just bogus. Everyone is having a different experience.

The back link display, and the tool bar PR display, (however late and inaccurate) definitely helped me inch my way up the rankings. Maybe there both gone now, it won’t stop us, but they were excellent aides. Makes sense to me they would take those two bullets away. Sounds a little goofy I admit, but it, "just keeps gettin harder every dayyy"!

Macro




msg:214908
 2:38 pm on Oct 8, 2004 (gmt 0)

>> Even a random factor (multiplicative or additive) for PR value shown in the toolbar wouldn’t be consistent with all my measurements.

Doc, please explain.

mrclark




msg:214909
 2:39 pm on Oct 8, 2004 (gmt 0)

I'm sure this has been mentioned but my line of thinking, is that older links count for more than new ones.

This way, it keeps out people who purchase front page links for X amount for 2 weeks, for example.

The links that have been in place for say 6 months, give a higher PR than a link that has just recently been spotted by Google.

I also agree with on theme PR counting for more, for example, a Jesus Christ followers website linking to a Viagra shopping site.

Steve

nalin




msg:214910
 2:46 pm on Oct 8, 2004 (gmt 0)

Even a random factor (multiplicative or additive) for PR value shown in the toolbar wouldn't be consistent with all my measurements.

For pages on my site which are undervalued, those which are linked in similar manners share the same pr.

In one example
Article X shows pr 6
~10 Article X_Y (with X as the parent and only linking page) show pr 1
~50 Articles X_Y_Z (with one of the Y's as the parent and only linking page) show pr 4

One Y is older and shows higher pr (5), and incedentally the Z's are newer then any of the Y's. The anomolie Y's were created ~4 months ago, prior to the 2nd to last update but were new enough by a few days not to show pr at that time.

Oliver Henniges




msg:214911
 2:49 pm on Oct 8, 2004 (gmt 0)

1) It still says 4.285.199.774 websites on the startpage

2) OT: [news.google.com...] is this new? Never seen that before and a lot of beta-pages on it.

> My own experience has conviced me that on topic backlinks confer a serious benefit now.

I'd support that. Recently I wrote a little script to analyse my backlinks as compared to my competitor's on a given keyword. I thought those directories which link to most of the result urls of a given search-query would make up a good approach towards "authorities" and spend pretty much time to add my url to those directories, that had not listed my site yet. First there is a massive change in the results of my script between sep 18th and the one run i did today. Secondly the result of all my efforts was quasi null, whereas a good friend of mine, running a site on plants and preferring qualified, content-related backlinks has made good progress the last weeks.

As to the technical arguments: I see no need the PR algo itself had to be changed. Its no problem to let a number of additional (semantic) filters run over the first PR-table and present the result of these operations to the toolbar and serps.

Kirby




msg:214912
 2:52 pm on Oct 8, 2004 (gmt 0)

<but just to reiterate again, I only have a dozen examples, but all could be explained by an age-of-link reason. This is the single they they have in common.>

My examples show just the opposite results that would be attributed to an age-of-link benefit or penalty. It looks more like a have-baked calculation. They didnt complete enough iterations, or the calculations left out data. I dont see how anything other than "fuzzy' math accounts for steveb's presidential site example.

mrclark




msg:214913
 3:01 pm on Oct 8, 2004 (gmt 0)

The thing is .. how does Google know what 'on topic' is?

My website promotes a company name .. Google doesnt know by looking at the company name what my website is about ..

Therefore I'm not sure how Google would be able to give me a bonus if I linked from a on theme website.

Steve

graywolf




msg:214914
 3:22 pm on Oct 8, 2004 (gmt 0)

My website promotes a company name .. Google doesnt know by looking at the company name what my website is about

You are selling some sort of product or service I assume, and hopefully you are going to talk about it. Not saying you can't fool it, but why would you want to?

Try the adsense preview tool

[webmasterworld.com...]

doc_z




msg:214915
 3:29 pm on Oct 8, 2004 (gmt 0)

>> Even a random factor (multiplicative or additive) for PR value shown in the toolbar wouldn’t be consistent with all my measurements.
Doc, please explain.

Hard to explain without going into detail, but I can try to give you an example:
In case of an exponential decay a random value (individually applied to each point) wouldn't change the shape of the (regression) curve just the individual PR values. This means that fitting the values should almost lead to the real values of the exponential decay, i.e. the local noise is averaged out. Thus two identical linking structure with same incoming PR can have different PR values (shown in the toolbar) for the 'same page' (due to a random effect) but global parameters (where the randomness is averaged) should be the same.

This 85 message thread spans 3 pages: < < 85 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved