homepage Welcome to WebmasterWorld Guest from 54.204.142.143
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
So Google Changed their PageRank Algo - How?
tedster




msg:3708225
 5:14 pm on Jul 26, 2008 (gmt 0)

In January, Google VP Udi Manber wrote a tantalizing article [googleblog.blogspot.com] where he said "we made significant changes to the pagerank algorithm in january."

So we've had half a year since that change. What kinds of things do you think happened? Or even more, in what ways is PageRank no longer the original formula, even if the change was earlier than last January? Here are some ideas I've been kicking around:

  1. Internal links and external links on the same page may not be splitting the PageRank vote equally.

  2. PR may be weighted differently according to where links appear in the page template - menu, footer, header, main content.

  3. Multiple links to the same url from the same page may not each get the same piece of the PR vote. Does only one "count"? Or maybe the extra links are just ddevalued a bit?

  4. Run-of-site external links may have their PR vote damped down. I'm thinking particularly about blogrolls here, but other situations might be similar, don't you think?

  5. Links between domains that Google sees as "related" may have their PR significantly damped down. And what about interlinking between a domain and its subdomains? Surely that doesn't deserve an equal slice of the PR Pie.

What other changes might there be?

 

Marcia




msg:3708267
 7:25 pm on Jul 26, 2008 (gmt 0)

One, two and three are relevant to internal linking and PR distribution, where I've seen some changes for a while, but to go forward to what's been changed, it's probably helpful to go backward first and take another look at the older PageRank paper.

This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.

The PageRank Citation Ranking: Bringing Order to the Web. [dbpubs.stanford.edu]
(Choice of .txt or .pdf versions)

There are some points that could easily relate to changes we're seeing, like the basic about how ranking is divided on a page by number of links. And an interesting thought:

Is the surfer's walk really as random as they originally stated, in view of years of accumulated statistics, and looking at eyetracking studies and hotspots for relative value of the location of links on pages?

Added - this patent application (published February, 2008) is worth a good look:

Systems and methods for analyzing boilerplate are described. In one described system, an indexer identifies a common element in a plurality of related articles. The indexer then classifies the common element as boilerplate. For example, the indexer may identify a copyright notice appearing in a plurality of related articles. The copyright notice in these articles is considered boilerplate.

Systems and methods for analyzing boilerplate [appft1.uspto.gov]

[edited by: Marcia at 8:09 pm (utc) on July 26, 2008]

steveb




msg:3708287
 8:23 pm on Jul 26, 2008 (gmt 0)

"Surely that doesn't deserve an equal slice of the PR Pie."

Of course they do. It would be totally illogical, and disasterous in terms of logical rankings, to not give the same piece of the PR pie to links to subdomains or internal pages.

Likewise position on a page is just too stupid for Google to seriously make an alteration based on. (A ranking value consideration, fine, a PR consideration, no.)

The most probable place for a change is age of links and age of domain. The next most probable is how PR is distributed when links are discarded (duplicate/tripilicate links, nofollow links...). Previously nofollowing links probably was wasting pagerank. After Google (strangely) started suggested using nofollow to manipulate your pagerank, nofollowed links were probably ignored in PR calculations.

Marcia




msg:3708417
 12:35 am on Jul 27, 2008 (gmt 0)

Likewise position on a page is just too stupid for Google to seriously make an alteration based on.

What's stupid about it?

Philosopher




msg:3708420
 12:51 am on Jul 27, 2008 (gmt 0)

Nothing stupid about it at all.

If PR is supposed to me the probability of a random surfer clicking a link. Don't you think that WHERE on the page that link is found would have a LOT to do with whether or not it's clicked?

IMHO the stupid thing would be Google NOT looking at where on the page a link is.

OnlyToday




msg:3708423
 1:25 am on Jul 27, 2008 (gmt 0)

Is the surfer's walk really as random as they originally stated, in view of years of accumulated statistics, and looking at eyetracking studies and hotspots for relative value of the location of links on pages?

Don't you think that WHERE on the page that link is found would have a LOT to do with whether or not it's clicked?

It has been Google itself with its AdSense optimizations that has been preaching that very gospel. Google would be very stupid indeed if it did not use the refinements it discovers and develops on one side to tweak the other. The search and advertising teams are separate as business entities but how could they not be using each others' research to improve their products?

whitenight




msg:3708434
 2:08 am on Jul 27, 2008 (gmt 0)

Before this thread gets too out of control with conjecture and anecdotal evidence...

IT'S ENTIRELY TESTABLE... so TEST it!

Again, who cares "what makes sense"?!
TEST IT! and find out what works...

or use Google and find the multitude of tests already done on this subject....

And if you believe the past tests done on link placement are unreliable with the new algo update, test it! and show what you've learned.

[edited by: whitenight at 2:09 am (utc) on July 27, 2008]

classifieds




msg:3708435
 2:09 am on Jul 27, 2008 (gmt 0)

unless googlebot is reading style sheets there's no way they can tell where a link is on the page.
Nothing stupid about it at all.

If PR is supposed to me the probability of a random surfer clicking a link. Don't you think that WHERE on the page that link is found would have a LOT to do with whether or not it's clicked?

IMHO the stupid thing would be Google NOT looking at where on the page a link is.


tedster




msg:3708441
 2:24 am on Jul 27, 2008 (gmt 0)

Isn't it pretty clear what a div contains without even seeing the CSS - a list of links, a lot of text, a call to an external ad server, and so on?

----

With regard to testing, I haven't done fully controlled experiments, but I do have two anecdotes. The idea of PR vote values changing between related domains came from analyzing two different groups of domains, from two completely independent businesses that are in very different niches.

Both groups are composed of openly interlinked websites, centered around a principal domain. Both groups have done well for several years, but in both cases it's the central domain that has the bulk of truly independent backlinks.

The secondary domains in both of these groups saw some rankings losses in early Febrauary, and the only factor I could see was the fact that these outlying domains were being sustained mostly by the interlinking. In both cases, we undertook a campaign to publicize the secondary websites and give them some new, strong content - hoping to attract a stronger backlink profile.

In both cases, rankings began to climb again, although they're still not back to where they used to be. The central or core domains were never affected, so it doesn't fit the footprint of a penalty or having an entire "network" wiped out.

[edited by: tedster at 3:03 am (utc) on July 27, 2008]

Marcia




msg:3708453
 2:58 am on Jul 27, 2008 (gmt 0)

Didn't the first "experiment" mentioned in the original paper use just PR and page titles as a metric? What's so hard about trimming out the fat - parsing out boilerplate content - to get to the meat?

Marcia




msg:3708458
 3:20 am on Jul 27, 2008 (gmt 0)

>>With regard to testing, I haven't done fully controlled experiments, but I do have two anecdotes.

Speaking of anecdotal evidence, how about when there's consistent evidence over a period of time that with related sites linked to, hosted on the same c-block, only a couple of them will have PR transmitted to them, with PR0 to the others?

Is Yahoo! the only engine that takes a hard look at nepotistic linking, or is it possible that Google does also?

Adversarial Information Retrieval on the Web [sigir.org](PDF file)

How about nepotistic linking and cross-linking in boilerplate portions of web pages - like sidebars or footers? Is it worth taking a look at that?

tedster




msg:3708460
 3:26 am on Jul 27, 2008 (gmt 0)

In both cases I reported above, the cross-linking from the satellite domains is run-of-site and in the template. The core domain uses only links from the content area of relevant pages.

So even though the cross-links are in the main content area of the powerhouse core domain (some even on the Home Page) they don't seem to be having much effect any more, not the way they once did. My gut feeling is I could drop the links and see only minimal effect on the satellite domains' PR or rankings. However, the lnks are there for users, so I won't do that experiment.

---

It seems to me, theoretically, that changing PR calculation would be a good place to hit some factors at the "starting gate". That way they don't even need anything more built into the query-dependent part of the ranking/relevance algo.

So when it comes to controlled testing going forward, I'd like to zero in on exactly what factors to test for. The five I mentioned in the opening post are what comes to mind, but I'm still looking for more ideas. It also seems like a good time to test, since we have PR data as fresh as we'll ever get, and we have maybe three months before the next export of PR data to the toolbar.

minnapple




msg:3708465
 4:38 am on Jul 27, 2008 (gmt 0)

[1. Internal links and external links on the same page may not be splitting the PageRank vote equally.]

- They naturally have a different weight because of all the variables associated with them.

[2. PR may be weighted differently according to where links appear in the page template - menu, footer, header, main content.]

Think about a new link, embedded in a article.
It smells of something news worthy.

An outgoing link within a side bar, with a few dozen other existing links, is a general endorsement of the site.

A link to a internal page site wide, is stating - "spider this page often and give it more weight" because it is important.

[3.Multiple links to the same url from the same page may not each get the same piece of the PR vote. Does only one "count"? Or maybe the extra links are just ddevalued a bit?]

You really should think about how many sites are linking to the outbound linking site.
If they are numerous sites linking to it, a few links to the same site are natural no matter what page it comes from.

[ 4.Run-of-site external links may have their PR vote damped down. I'm thinking particularly about blogrolls here, but other situations might be similar, don't you think? ]

Yes, If you link to everyone, you are just cheap and "easy" ;).

[ 5. Links between domains that Google sees as "related" may have their PR significantly damped down. And what about interlinking between a domain and its subdomains? Surely that doesn't deserve an equal slice of the PR Pie. ]

Related domains [ owned by the same owner ] have been recognized and adjusted for some time.
Subdomains seem to be handled well within logical reason.

steveb




msg:3708468
 5:00 am on Jul 27, 2008 (gmt 0)

"Nothing stupid about it at all."

See above. It's bizarre to first want to base PR based on whether it is two inches from the top of a page or four, or whether navigation is on the right or left of a page. And second, where something is rendered on a page can be unrelated to where it is in the code.

Whitey




msg:3708494
 6:49 am on Jul 27, 2008 (gmt 0)

Alika #:3708319 - Nine year old site went down from 6 to 5, and all pages across the board getting -1.

Google Webmaster tool showed a significant jump in duplicate titles, duplicate and short metadescriptions that we are still working to rectify. [webmasterworld.com...]

Maybe greater sensitivity with regards to duplicated pages is playing a part and the attempt to increase reporting levels in WMT [ which appear to have a bug ]. Interesting to note this site is quite old.

tedster




msg:3708593
 1:29 pm on Jul 27, 2008 (gmt 0)

Interesting idea, Whitey. Identify two urls as being duplicate content and throw out the PR transferred by the links on one of them. That sounds like a right idea to me.

Whitey




msg:3708875
 12:42 am on Jul 28, 2008 (gmt 0)

JS_Harris #:3708244 [webmasterworld.com...] A page of mine that had PR4 and drove solid traffic from search stopped sending any traffic two months ago. PR on the page is now grayed out. There is one affiliate link on the page, always has been. The page was manually penalized for its main term apparently.

I'd like to hear more from affiliate marketers on how their visible pagerank changed

Not sure if the TBPR update is sufficiently finished yet and enough reports are in, but it could be that content related scoring on pages has increased in it's sensitivity as well, irrespective of the links into the page.

If the body content is duplicated [ as in affiliate / sydndicated content marketers ] those results may have further diminished in PR importance. My hunch is that if enough of those pages fall into that category, they may contribute to an overall site decline.

Externally generated duplicated content [ as in affiliate / syndicated content ], may not be the only consideration as I'm seeing internal scripted pages' TBPR greyed out. This indicates to me that internal duplicate content considerations may also cranked up a notch.

I think these and other changes were likely in play over the last few months, as the effects seemed to have been coming in reported in different ways.

tedster




msg:3710062
 8:38 am on Jul 29, 2008 (gmt 0)

In another thread Google now at a Trillion URLs [webmasterworld.com], Marcia zeroed in on one sentence in the Google blog post:

Google re-process[es] the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections

Not all of the 1 trillion urls that Google has discovered are in the visibile index - not by far. So does this sentence mean that links on those discovered-but-not-indexed urls are still part of the web-link graph, and that they are capable of influencing PageRank?

Right now I'd lean towards saying "no" - the article just misspoke in an attempt at shock and awe. But it's something worth considering. I'll be watching for clues whenever a site has conditions that make testing for this relatively easy.

Marcia




msg:3710083
 9:15 am on Jul 29, 2008 (gmt 0)

Not all of the 1 trillion urls that Google has discovered are in the visibile index - not by far. So does this sentence mean that links on those discovered-but-not-indexed urls are still part of the web-link graph, and that they are capable of influencing PageRank?

Why wouldn't they be part of it? Assuming a random walk of the web as the fundamental basis of PR accretion and distribution, does it matter to the random surfer whether or not a page/URL is indexed by Google, if they're just somehow "randomly" coming across a page/URL?

"Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day."

Not "the part of the web graph that Google has included in their index" but "the entire web graph" is what it says. So what are we to think or conclude, when we try to read between the lines? Either they're using the entire web graph to recalculate PR daily, or they aren't. It's probably just as simple as that.

[edited by: Marcia at 10:01 am (utc) on July 29, 2008]

Whitey




msg:3710190
 1:03 pm on Jul 29, 2008 (gmt 0)

Not all of the 1 trillion urls that Google has discovered are in the visibile index - not by far. So does this sentence mean that links on those discovered-but-not-indexed urls are still part of the web-link graph, and that they are capable of influencing PageRank?

Sure they see the "trillion" . But they'd be continually motivated not to include the trashy , duplicate URL's in their published index. And that's where the PageRank calculations would be apportioned - wouldn't you think?

potentialgeek




msg:3710368
 4:10 pm on Jul 29, 2008 (gmt 0)

Tedster,

Assuming you're not a PR link seller--:-)--my question is what SERP algo changes is Google integrating into its PR algo? In the past have you seen any similarities at around the same time each algo changed?

p/g

julinho




msg:3710667
 9:40 pm on Jul 29, 2008 (gmt 0)

This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them.

Maybe Google has devised better ways to measure human interest.

For example, if Google finds out that link A draws ten times more clicks than link B (regardless of where in the page they are located), maybe link A should transfer more PR than link B (because humans showed that that link is more useful/intesting to their needs).

What about the damping factor? In the original paper, this factor was distributed equally amongst all pages in the index.
But if Google knows that certain pages attract users starting new surfing paths (e.g., big portals or heavily bookmarked pages), maybe that factor should also be non-equally distributed.

Marcia




msg:3710718
 11:31 pm on Jul 29, 2008 (gmt 0)

See above. It's bizarre to first want to base PR based on whether it is two inches from the top of a page or four, or whether navigation is on the right or left of a page. And second, where something is rendered on a page can be unrelated to where it is in the code.

It's bizarre to think that they would care whether a boilerplate section on a page has yellow text on a purple background or is 9 screen-scrolls down from the top.

Boilerplate is still boilerplate, and they do look at it for a few different reasons. It's bizarre to think that they can't possibly weight it differently or that they can't possibly apply a different damping factor to elements that are duplicated sitewide - including navigation links.

tedster, in the first post:
2. PR may be weighted differently according to where links appear in the page template - menu, footer, header, main content.

I'll buy that as a possibility, and qualify part of a definition by whether or not it's boilerplate content, which is an element that is a form of duplicate content as part of all the pages when using it sitewide - as opposed to section-specific second-level navigation, rather than a fully meshed navigation structure.

BTW, here's another point to ponder about internal PR distribution:

When there are 5 or so *main* /subdirectory/ sections of a site, with links to the index pages of those subdirectories running sitewide, like in top or bottom or side boilerplate navigation, how come some of those subdirectories will have decent PR while others will be greyed on on the TB, even though pages inside the greyed out /subdirectory/ do have PR showing?

[edited by: Marcia at 11:49 pm (utc) on July 29, 2008]

Marcia




msg:3710729
 11:54 pm on Jul 29, 2008 (gmt 0)

For example, if Google finds out that link A draws ten times more clicks than link B (regardless of where in the page they are located), maybe link A should transfer more PR than link B (because humans showed that that link is more useful/intesting to their needs).

Interesting thought, because someone mentioned that very factor to me about 6 years ago, having seen what seemed to be just the case that they'd observed with certain prominent, much-clicked-on links of theirs. They didn't say it publicly, but if they had said it publicly, considering the source no one would argue the point, they would definitely accept it as a possibility.

tedster




msg:3710745
 12:17 am on Jul 30, 2008 (gmt 0)

Assuming you're not a PR link seller--:-)--my question is what SERP algo changes is Google integrating into its PR algo? In the past have you seen any similarities at around the same time each algo changed?

I've seen nothing that suggests PageRank has become query dependent.

damping factor

I always thought that factor was there so that PR calculations would converge - without it they just fly off to infinity.

steveb




msg:3710746
 12:24 am on Jul 30, 2008 (gmt 0)

"discovered-but-not-indexed urls"

Unfortunately that's not a distinct phrasing.

Many supplementals are known, and "indexed" in the sense of being cached, but do not appear in the search results in any way. Are such pages able to send link juice of any kind? I'd say probably. Other URLs that have been seen by the bot but are not yet crawled also probably DEDUCT PR from the other URLs on the same page being crawled before they are indexed.

minnapple




msg:3716262
 3:25 am on Aug 6, 2008 (gmt 0)

Without going into detail, I will state un-indexed pages, [gray bar pages] are useless for adding any significant link pop.

Marcia




msg:3716319
 6:17 am on Aug 6, 2008 (gmt 0)

6. How about freshness (or staleness) of the linking page?

Robert Charlton




msg:3719443
 9:13 pm on Aug 9, 2008 (gmt 0)

6. How about freshness (or staleness) of the linking page?

Just to kick this question up.

I've had one observation I feel I could attribute to the staleness of linking pages... where some months back I noted that a client was slipping, and noted that too many of the inbound links to the home page were coming from old, stale pages, some of which were grey-barred.

Re freshness... I've observed that inbound links can take some months to show full effect. It's hard, though, to separate age factors from other upstream effects that are acting on linking pages.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved