Flattening Effect of Page Rank Iterations - explains the "sandbox"? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Flattening Effect of Page Rank Iterations - explains the "sandbox"?

«
1
2
3

grant

4:51 am on Apr 27, 2006 (gmt 0)

10+ Year Member

I have had my new sites rank well initially, then drop.

Here is what I think is happening, which is what I call the flattening effect of PageRank iterations.

Note the PageRank equation (sans filters) is:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) .

The first observation about this equation is that it can only be calculated after a statistically significant number of iterations.

If you analyze a site with 5 pages that all link to each other (the homepage having an initial PageRank of roughly 3.5), what you see in the first iteration of PageRank is that the homepage is PR 3.5, and all other pages are PR .365 – the largest PR gap that will ever exist through multiple iterations in this example.

This homepage PR represents a surge in PR because Google has not yet calculated PR distribution, therefore the homepage has an artificial and temporary inflation of PR (which explains the sudden and transient PR surge and hence SERPs).

In the second iteration, the homepage goes down to PR 1.4 (a drop of over 50%!), and the secondary pages get lifted to .9, explaining the disappearing effect of “new” sites. Dramatic fluctuations continue until about the 12th iteration when the homepage equilibrates at about a lowly 2.2, with other pages at about .7.

I believe that the duration of the “sandbox” is the same amount of time it takes Google to iterate through its PageRank calculations.

Therefore, I think that the “sandbox” is nothing other than the time it takes Google to iterate through the number of calculations uniquely needed to equilibrate the volume of links for a given site.

The SEO cynic will ask “but my site withstood the ‘sandbox’, so it can’t exist!’”.

Revisiting the equation, sites CAN withstand the flattening effect of the PR iteration with optimized internal link structures (that don’t bleed PR but rather conserve them) OR have an active inbound PR feed to central distributions of PR.

Marcia

6:33 am on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

<sidebar>

This thread was a short time before the change-over to the new system of rolling updates rather than monthly index updates with monthly PR recalcs and updates & TB updates. It was posted May of 2003.

I would hold on to the idea of an update that brings in more data for a little while longer. In time, I do think things will be more gradual. However, we're still in the transition period for this system, so I wouldn't be surprised to see a traditional update for a little while longer.

Is Freshbot now Deepbot? [webmasterworld.com]

</sidebar>

tedster

4:48 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I feel the need to make a comment about this thread and why it is a Home Page thread.

Notice that there is a question mark after the title. We thought that the discussion was worth having, especially because it has been a while since there was any technical conversation here about the Google algorithm. That is, we did not feature the thread on the Home Page to somehow say "this is true".

The counter points being discussed here are an important part of why this thread has value and interest. Because of grant's proposed theory, I for one have re-examined the sweep of Google's many changes over the past 2+ year and it has helped me bring the bigger picture more into focus. If I get something a bit more concrete out of my current ruminations, I'll be sure to post about it.

neuron

8:32 pm on May 8, 2006 (gmt 0)

10+ Year Member

PageRank is calculated for all pages indexed by Google. Not just one or two iterations, but all of them.

How many is all of them? 5 to 7. Google is interested in ranking pages, not in calculating the exact values of all pages. It has been shown that after 5 to 7 interations the rankings of pages no longer interchange. That is,the calculated value of the #1 result will not drop below the calculated value of the #2 result. Sites do not rank as #1.2704, #2.4370, they rank as integer values, 1, 2, 3, 4 and so on. So, as long as PR is consistent between their ranking order, the calculation is complete.

You can find the original explanation of Page Rank at The Anatomy of a Search Engine [www-db.stanford.edu]. See Section 2 for explanation of PR.

The fact that Google Toolbar PR is calculated only once every 3 months or so should not be confused with the internal rate of PR calculation, which is now virtually continuous. The sandbox is not associates with toolbar PR.

tedster, thank you so much for that brief, refreshing history of google's churn these past few years.

You left out a couple of things that might should be considered. For instance, when the sandbox began in late February 2004, the number of pages indexed on the google home page froze at 2^32, or about 8 billion pages, for about 9 months, when sites began to be released from the sandbox enmasse, and the number of pages indexed doubled.

It was about the same time that the Supplemental Index came into being, something else that happened in the aftermath of Florida update in November 2003.

We had evidence two years ago that there were multiple indexes. I believe that Google ran into problems in the scale of their index, and had to shunt some pages (it actually worked by domains) to a 2nd level of indexes, and that the "sandbox" is an emergent property of Google's dealing with an index limitation problem by creating multiple indexes and the algorithms used to delegate pages (domains) to those new indexes.

About February to March of last year, 2005, I was expecting an announcement from Google that they had created and released a new index, and the end of the sandbox as all indexes were re-integrated to the master index.

Are there not new datacenters that are being called "Big Daddy"? What is that about?

I've noticed in the past couple of days that a LOT of sites are coming out of the sandbox, though I do not see a huge "I'm out of the sandbox" thread, I would not be surprised to see one.

IMHO google is right now in the process of rolling out a new index, Big Daddy, and that the multiple indexes that have been created in the past 2.25 years are all being re-integrated, and that as a result the sandbox phenom will disappear, since it was an emergent property of the creation of the algorithms that tiered sites among the various indexes.

There is a related thread that interested members might should read Major Change in Supplemental Result Handling [webmasterworld.com]

There's also major commotion going with Pages Dropping Out of Big Daddy Index [webmasterworld.com]

If Google was in the process of integrating these cross-calculated indexes into a single index, they I would expect they would show symptoms similar to what is going on now. Also, as a result of all this, I would predict the demise of the sandobx, and I am seeing a lot of sites come out of the sandbox.

This 63 message thread spans 3 pages: 63

«
1
2
3