homepage Welcome to WebmasterWorld Guest from 54.227.141.230
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 63 message thread spans 3 pages: < < 63 ( 1 [2] 3 > >     
Flattening Effect of Page Rank Iterations - explains the "sandbox"?
grant




msg:730490
 4:51 am on Apr 27, 2006 (gmt 0)

I have had my new sites rank well initially, then drop.

Here is what I think is happening, which is what I call the flattening effect of PageRank iterations.

Note the PageRank equation (sans filters) is:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) .

The first observation about this equation is that it can only be calculated after a statistically significant number of iterations.

If you analyze a site with 5 pages that all link to each other (the homepage having an initial PageRank of roughly 3.5), what you see in the first iteration of PageRank is that the homepage is PR 3.5, and all other pages are PR .365 – the largest PR gap that will ever exist through multiple iterations in this example.

This homepage PR represents a surge in PR because Google has not yet calculated PR distribution, therefore the homepage has an artificial and temporary inflation of PR (which explains the sudden and transient PR surge and hence SERPs).

In the second iteration, the homepage goes down to PR 1.4 (a drop of over 50%!), and the secondary pages get lifted to .9, explaining the disappearing effect of “new” sites. Dramatic fluctuations continue until about the 12th iteration when the homepage equilibrates at about a lowly 2.2, with other pages at about .7.

I believe that the duration of the “sandbox” is the same amount of time it takes Google to iterate through its PageRank calculations.

Therefore, I think that the “sandbox” is nothing other than the time it takes Google to iterate through the number of calculations uniquely needed to equilibrate the volume of links for a given site.

The SEO cynic will ask “but my site withstood the ‘sandbox’, so it can’t exist!’”.

Revisiting the equation, sites CAN withstand the flattening effect of the PR iteration with optimized internal link structures (that don’t bleed PR but rather conserve them) OR have an active inbound PR feed to central distributions of PR.

 

optimist




msg:730520
 2:43 am on Apr 29, 2006 (gmt 0)

Nice theory that can certainly be one possibility, BUT this missed two aspects of the G algorithm defined in the G application for patent:

"linkage" Time factors are placed on the age of the link, when the link was first placed and factors that add into how often the site linking to the target site is updated, if the site is updated often - bonus points, if the site is never updated, less PR.

"siteage" older sites do get more priority as they are less likely to be trying to artificially inflate PR. G loves age and all kinds of age factors on linking are used in the algorithm.

How do you mathematically factor for time? When time is not solely based on PR but other factors such as whois and links pointing to the site.

What if its simply a matter of cycle, the same cycle that it takes for life to be created is being emulated by the cycle a site has to take to get out of the sandbox which seems to be based on the womb. Maybe National Geographic can tell us how to makes this premature.
:)

tedster




msg:730521
 4:22 am on Apr 29, 2006 (gmt 0)

To better inform this thread, I'm trying to piece together a history of major algorithm changes at Google -- please correct me if I have some of the timing or details wrong here.

Everflux showed up before June 2002 as Google had already begun integrating the results of "fresh crawl" into their index on a daily basis. Seems to me that some kind of PR-on-the-fly calculation must have been in place for this advance to be possible.

ref: Google Update FAQ [webmasterworld.com] June 2002
ref: GoogleGuy on Everflux [webmasterworld.com] Msg #5, Oct 2002

Still, up until that bombshell of an update called Florida in Nov 2003, the Google SERPs were relatively static for about a month at a time. Those roughly monthly updates were highly anticipated by webmasters, and they showed the results of whatever Google had eaten in the past few weeks, including I believe a major PR recalculation that replaced the everflux "estimated" PR.

ref: Google Update History [webmasterworld.com] through Florida, Nov 2003

So in those nearly pre-historic days, it seems to me that PR was being precisely crunched about once a month, with some form of estimated PR (quick PR) being folded in for fresh crawl pages.

But with the Florida update (which came after 3-4 months of preparation behind the scenes) Google started a very different method of ranking calculation. Then we no longer had the fun of monthly updates, but still we had everflux, plus lower level algorithm shifts created by "dial turning" of the existing algo. Major updates began to happen only when entirely new algorithm factors were installed, each new factor complete with its own set of "dials".

How long after Florida did we start noticing the sandbox effect - about 6 months, less? PR was already being re-calculated almost continually for over a year at that point, and during that period we did not see the sandbox effect. We did not see it immediately following Florida, either...or did we?

ref: Timeline for Google indexing a 'new' site over the last two months [webmasterworld.com] Jan 2004

At any rate, somewhere early in 2004, we were talking about it. A new factor or set of factors was folded into the ranking algo at that point, and new sites were really feeling the bite. This new factor combined its effect with the ongoing PR re-calculation and somehow the combination created the newly observed "sandbox effect".

At that point, particularly for more competitive searches, a new domain seemed squelched for an indefinite period of time. Even though it might be getting some traffic on long tail stuff, and it was getting spidered regularly, and urls were showing up in the site: operator query -- still, no SERP placement on major keywords

It seems clear that it's not "just" PR that's involved. The sandbox effect also includes whatever new factors were installed into the algo right before spring 2004, or more likely with the Florida update -- and those factors directly impacted high frequency search terms, or possibly just the major money terms.

PR calculation PLUS "something about Florida" seems to be the key to the sandbox effect, IMO. What part of Florida is it?

Remember how, at that time for a search on kw1 kw2 a page was not ranking, but for kw1 kw2 -asdf it was? Remember how even some sites with long standing #1 positions on single keywords took a dive? Especially sites with a keyword in the domain name? Exactly what kind of new filter or factor was that? Doesn't it sound a lot like the sandbox effect?

ref: Update Florida [webmasterworld.com] - warning! 3000 posts in one week

I'm off to find some old data!

Marcia




msg:730522
 4:59 am on Apr 29, 2006 (gmt 0)

Ted, there's a wrap-up with some references here dating to Fall 2002, including Brett's explanation:

Everflux and Google Updates [webmasterworld.com]

stevexyz




msg:730523
 5:38 am on Apr 29, 2006 (gmt 0)

Good post.

I managed for a very long period to avoid the sandbox on the domains we purchased. These domains were in actual fact sub-domains sold by a UK registrar. For example they had ownership of the domain XX.com and were selling domain names whatever.xx.com and yoursite.xx.com

Within a week of buying that domain (sub-domain) we would rank and be completely out of the sandbox for whatever site we produced. This continued for over 12 months. Now this registrar had a number of 2 digit domains EG xx.com - yy.com - zz.com etc etc. All of a sudden ALL of our sites that were on xx.com disappeared BANG! On investigation we discovered that a poker site had purchased xx.com sub domains and had spammed and spammed from this sub domain - the impact was direct on ALL of the sub domains on xx.com

We moved our xx.com site to yy.com sub domains - they came right back within a week to where they were previously. The poker site also realised they could rank and SPAM for a good 3 weeks by doing the same thing - BANG yy.com disappeared.

I bring this up since today the sandboxing appears to have moved from first level URL's to pages?

Hope this info helps those of you trying to figure this lot out. I have tried - DNS - age - nameservers - cache etc etc

tedster




msg:730524
 5:50 am on Apr 29, 2006 (gmt 0)

Thanks, Marcia. I was also remembering a post from GoogleGuy that explicitly mentioned the ability to calculate and update PR almost continually. I found one mention here:

[webmasterworld.com...] Msg #5

...but that's much too recent (June 2005) to confirm when this ability to do rolling PR calculation began. I'm sure there was something a lot earlier. At least this post confirms that the PR calculation is quite intensive.

We have a bank of machines that computes PageRank continuously
(and continually, I suppose; I wasn't an English major), but we only export
new visible PageRanks every 3-4 months or so.

Pico_Train




msg:730525
 6:16 am on Apr 29, 2006 (gmt 0)

Tedster,

I'm quite sure something major happened in later April, early May 2005 with regards to the algo and SERPS. I'm not sure what it was or if anybody else can back me on up on this one but there was a shift in how things worked, at least for me. I remember quite a few people feeling the same way back then so maybe they will come out of the woodwork now...

McMohan




msg:730526
 6:49 am on Apr 29, 2006 (gmt 0)

grant, thought provoking post.

However, apart from the doubts other members posted here, I would like to add one more point.

You took the example of a site homepage of PR3.5 in the beginning. But, Google will start with a guess figure (It can be a zero or 10, no one knows), and iterates the PR calculation, till the whole system comes to an equilibrium of 1. The way the PR formula is designed, you start with any number, the equilibrium of 1 is achieved towards the end of iterations. So whichever guess figure that Google starts with, isn't the actual PR and it will have no strength in ranking a site.

Assuming the initial guess figure is zero. With each iteration, the homepage PR increases until the whole average PR of the system is 1. If iteration process were to be behind sandbox, then after each iteration, the site's rank should have improved. But in actual practice we see the site lurks somewhere deep-down for a long time, and in one fine swoop ranks top.

Marcia
Bonus question: How many iterations (and how much time) does it take for Pagerank to converge?

I guess it depends on the complexity of a site's network, both its own pages and the external links going out and coming in. A site with single page linking to another site with single page and vice-versa, will be a simplest example that takes less iterations, than a site with hundreds of pages linking to hundreds of pages and linked back from hundreds of pages.

stevexyz




msg:730527
 8:09 am on Apr 29, 2006 (gmt 0)

Pico_Train

Yes exactly - the sandboxing changed from age of entire URL and other factors to something different in May June 2005 it actually started in January and progressed until by June it was fully implemented. It was at this point in time that the age factor may have changed - maybe it still exists but with another filter applied on top of it. I think it was the removal of all KEYWORD url's from all the SERPS on the top "money" keyword phrases EG - if your site was cheap-widgets.com and the keywords in G's "cheap widgets" matched up with its "money keywords" it was sandboxed regardless of whether or not the site was SPAM - this is still relevant today - IE try and find a site ranking on a "money" keyword that has a URL keyword-keyword.com - they don't exist in the SERPS - if you do find one its NOT a major "money" keyword according to Google.

stevexyz




msg:730528
 8:15 am on Apr 29, 2006 (gmt 0)

One note before this theory is blown away - this noes NOT apply to single keywords - EG keyword.com only on keyword-keyword.com

Strangly keywordkeyword.com is still OK - IE not separated - I just think its a matter of time before a filter gets added to that also.

Marcia




msg:730529
 8:39 am on Apr 29, 2006 (gmt 0)

>>>if your site was cheap-widgets.com and the keywords in G's "cheap widgets" matched up with its "money keywords" it was sandboxed regardless of whether or not the site was SPAM - this is still relevant today - IE try and find a site ranking on a "money" keyword that has a URL keyword-keyword.com - they don't exist in the SERPS - if you do find one its NOT a major "money" keyword according to Google.<<<

Are you talking about sites that fall within what's considered "sandboxed" because of being new? Isn't the understanding that what the sandbox does is keep most NEW sites from ranking for around a year, give or take a few months?

Are you saying that keyword domains for money words that are around a year old (or less), and would be considered new sites, are being handled differently because of the money keywords in the domain, or is it older sites that have been around for a while being affected?

That's two different things.

stevexyz




msg:730530
 12:05 pm on Apr 29, 2006 (gmt 0)

Marcia - In the beginning it was possibly only one filter that caused the sandbox effect. IE I can reliably confirm this was based on factors connected to the URL - age, name servers, DNS set up and maybe more - I could never completely figure out what it was exactly. At the time (2004) I suggested this on this forum but was blown away as I was not prepared to explain why I was able to prove this - simply because the answer to avoid the sandbox was to set the site up on a sub domain. Provided the sub domain's first level URL XX.com met the criteria of NOT being sandboxed pre JAN 2005 at that time a site keyword-keyword.xx.com was never sandboxed and was in the SERPS within a week. Had I explained this on this forum G's attempts at reducing SPAM would have been in vain. Unfortunately in about October 2005 a number of poker and other very obvious SPAM webmasters also figured this out. G obviously picked this up and I think, added at least another two filters. I think I know of one which is the keyword-keyword.com filter - there is another however as using sub-domains on unsandboxed URL's no longer works - this filter applies a filter by new pages and no longer on complete URL's.

The "quick fix" (for Google) at that point in time was to add the keyword-keyword.com filter - in a round about way this was confirmed to me at the Vegas pubcon by a comment which went like this - "by applying this filter we discovered that BANG 70% of the SPAM disappeared - first suggested at the London pubcon".

What I am trying to say is the sandbox effect which was the result of only one filter initially has been expanded to a number of filters to do the same thing today IE the sandbox effect - one of them being the keyword-keyword.com filter.

I must say I dislike posting too much info on this forum as I know that the webmasters that spam are possibly the first ones to read these forums.

sugarrae




msg:730531
 2:01 pm on Apr 29, 2006 (gmt 0)

Ok, I'm not super technical, but I'll plug in some of my likely to be useless observations anyway...

>>>I have no clue but I had this fanciful thought that maybe the iteration gears are halted while some trustrank quotient is determined and then injected into the process.

Ditto.

>>>BUT this missed two aspects of the G algorithm defined in the G application for patent

Not at all. Age of domain and age of links = how long have I know you and how many of my friends and accquaintances say you're a decent person = trustrank.

It seems like G is a reporter checking it's facts to the simple non-engineer mind like mine. It has all this information on a site, but needs time to verify as best it can whether or not the information is trustworthy. Getting the information is easy - verifying it is time consuming and you need some time for verification to hold any weight.

I guess because I'm not technical, I tend to need to equate it to something. Take posters at webmasterworld. When a new poster shows up and starts doling out advice, people don't believe them right away - they want to watch their posts and decide over time if the person is trustworthy and worth listening to. Or you may ask some other highly trusted friends or other trusted sources (who have built up trust with you over time) in your circle if the person is worth listening to and get a lot of "yes" replies from people that "vouch" for them to put any stock in the person.

Of course, I'm not super technical. But it has been my simplistic take thus far. But, it's also ten am and I've had five hours sleep. Pay any attention after that disclaimer. ;-)

doc_z




msg:730532
 2:45 pm on Apr 29, 2006 (gmt 0)

Nobody would develop an algorithm where the result depends on the iteration scheme. However, this would be the case if Grant's theory would be true.

If there is a relationship between the sandbox effect and PR calculation, this might be due to a different way of taking links (for PageRank calculation) into account. For example, if Google would only count external links which are older than one year, this would lead to dramatic effects especially for new domains (but the results wouldn't depend on the iteration scheme).

Bonus question: How many iterations (and how much time) does it take for Pagerank to converge?

This depends on the iteration scheme, the initial guess, the changes in the linking structure and the definition of 'stable' (the stopping criteria). Normally, one would take the results from the last iteration. Taking the original iteration scheme one should get almost correct data after 10 iterations, after about 30 iterations the results should be stable. If the old data are not taken into account it might take 50 -100 iterations. (How long does one iteration of PR calculation take? [webmasterworld.com])

doc_z




msg:730533
 3:52 pm on Apr 29, 2006 (gmt 0)

if Google would only count external links which are older than one year

Just to clarify (the example given above): internal links are counted immediately (as usual).

egurr




msg:730534
 6:54 pm on Apr 29, 2006 (gmt 0)

My math ain't what it used to be but..
It's clear that the equation posted is of a linear nature. When you research which sites show up for specific high frequency keywords it's clear there is a log function in there somewhere.
A little off topic I guess but in a way it does buttress the theory of page rank explaining the sandbox.
What I've never understood is why the PR equation falls apart at PR7. Look at this site, clearly in terms of importance to webmasters it should be a PR8 or even a PR9. Tons of inbound relevant links, always updated, several years old and a clean site. Why only PR7? The PR calculation would clearly indicate this site should be higher.

Hanu




msg:730535
 1:12 am on Apr 30, 2006 (gmt 0)

I was never a math genius but I just can't reproduce grant's numbers. Grant didn't tell us which value he/she was using for the dampening factor d. But worse, for whatever value I use for d I can't reproduce the decrease in homepage PR Grant describes. In my calculations the PR is always montonic increasing and asymptotic, i.e. it always grows from one iteration to the next until it stabilizes at a certain value. And this applies to the PR of all pages in the system. With higher values of d it takes more iterations before the value stabilizes.

Besides the fact that I can't confirm his numbers, Grant also refers to site structures that "bleed PR". PR bleeding or leakage is something completely different and has nothing to do with the (albeit unreproducable) phenomenon Grant is describing.

Last but not least Grant's hypothesis is incomplete. It only tries to explain the initial disappearing of a page from the serps after it having ranked well for several days or weeks. However, it does not explain the resurrection of the page after months/years.

shri




msg:730536
 2:43 am on Apr 30, 2006 (gmt 0)

A couple of questions.. this is the original pagerank calculation right? Then how come the sandbox effect is new? The old freshbot effect can sort of be explained by this .. not the sandbox.

However, I'm afraid that I've missed something elementary in this discussion, as no one else seems to reckon that why has the behaviour of an age old algorithm changed without modification to the factors taken into account.

CainIV




msg:730537
 3:10 am on Apr 30, 2006 (gmt 0)

Actually Grant's numbers and rationale in terms of iterations and changing pr values is correct. The inner pages do recieve more pr initially based on the aglorithm after one iteration because the full extent of passing pagerank back to the homepage has not happened yet.

However, this does not explain the following observations in regards to pagerank and the infamous sandbox suggested:

How does this small of a difference in pagerank values after a short iteration between inner pages and the home page cause such a drastic result inthe SERPs for Google?

Why do / can sites enter a filtered or sandboxed state after the iterations have been completed?

I personally believe Google filters sites based on more than one factor from the beginning. The two biggest factors IMHO initially are site / domain / page age and inbound links. If I were to evaluate based on trust I would base this first on age of trust (as one person made an example - how long one has known you to be reliable etcetc) The second criteria would be who it is that says you are trustworth.

All points to trustrank *ish* stuff to me.

JudgeJeffries




msg:730538
 10:27 am on Apr 30, 2006 (gmt 0)

Cant recall where, probably Matt Cutts blog, but didnt he imply that the sandbox was the unexpected effect of a combination of two other planned changes to the algo that they quite liked, and so kept?

doc_z




msg:730539
 11:06 am on Apr 30, 2006 (gmt 0)

I guess you're referring to this thread: [webmasterworld.com...]

dataguy




msg:730540
 3:18 pm on Apr 30, 2006 (gmt 0)

Cant recall where, probably Matt Cutts blog, but didnt he imply that the sandbox was the unexpected effect of a combination of two other planned changes to the algo that they quite liked, and so kept?

My personal experience leads me to believe that the two changes to the algo which lead to the sandbox effect are Linkage Rate and Traffic Patterns. Both of these factors have been discussed numerous times here, but I think the combination of the two can easily explain the sandbox effect.

My reasoning is this: Over the past year I've registered a few dozen domain names and put up temporary web sites with them, so that they could "ripen" while the real designs could be developed. 6 of those domains immediately started to receive type-in traffic, and all 6 of them appeared in Google after a few days to a few weeks. All the other domains either took a few months to appear or they are still sandboxed.

It seems to me that the fact that these domains were receiving traffic contributed to their indexing. If Google can determine traffic patterns to a particular web site, it seems obvious that with a web site which receives no traffic but has a rapid increase in backlinks, the linking would be recognized as unnatural linking, and the links would be ignored.

TypicalSurfer




msg:730541
 4:12 pm on Apr 30, 2006 (gmt 0)

These theories would make sense IF google were still an academic exercise.

Try this:

Find a KW combo that has no/little ad inventory, create a page for that KW, point a link. Presto! No sandbox.

From there, do the math.

activeco




msg:730542
 8:53 am on May 1, 2006 (gmt 0)

My personal experience leads me to believe that the two changes to the algo which lead to the sandbox effect are Linkage Rate and Traffic Patterns.

Totally agree, Dataguy.
The continuous iterations could nicely catch the natural link growth/decrease and compare the curve with a standard one.

julinho




msg:730543
 1:48 pm on May 1, 2006 (gmt 0)

If the OP's theory is correct, wouldn't that mean that Google is going backwards?
Google and everyone else know that, even if PR is fully calculated (in accordance with the original papers), that doesn't lead to perfect SERPs (it never did, even before webmasters manipulated links and PR).
So, what do they do? "Let's cut down on the number of iterations; that will lead to wilder, unpredictable SERPs, but let's hope that Y and msn don't do any better"?
I don't think so.

I side with those who defend that "sandbox" is a combination of effects.

Combining the recent works which have been made public, one may speculate that the score of a page today is something like:
SC = OPF x PR x LR x TR x SR
where
SC = total score, defines SERPs for a given keyword
OPF = on page factors (title, keyword density, proximity, anchor texts, etc)
PR = Page Rank
LR = Local Rank (links from the "right" neighbourhoods; I guess that Hilltop and LSI have a major role here)
TR = Trust Rank (links from sites which a human being checked out, or links which have been around longer, or otherwise gained trust)
SR = Site Rank (a score assigned to the site, instead of the pages; I guess it's something related to user engagement with the site, accumulated over time)

There are certainly other factors which were never heard of outside the Plex.

The factors are all multiplied. If your page scores badly in one single factor, the final score is lowered; e.g., if you pay $$$ thousands for PageRank but doesn't get any TrustRank, your final score is low.

Each factor has a weight, which Google can adjust at will. Also, each factor has a multitude of variables in its composition, which also only Google knows.

That means that there are a multitude of variables which may cause one factor, and hence the total score of a page, to get lower (even if all the other factors indicate that it should be higher). Just do one thing Google doesn't like, and that factor may go near zero, and hence your score.

When we don't know which factor caused the low score, we can just call it "sandbox".

julinho




msg:730544
 2:03 pm on May 1, 2006 (gmt 0)

Just to clarify:
I agree that PR is not calculated the way it used to be.
I just don't think that this change in the PR calculating is the cause of the sandbox.
To me, it indicates that PR is not as important in the algo as it used to be, and so Google prefer not to spend so much time and resources calculating accurate PR.

grant




msg:730545
 2:31 am on May 8, 2006 (gmt 0)

I'm going to try to answer everyone's questions. Let me start by saying I used a PR calculation spreadsheet that I did not develop.

Marcia
Bonus question: How many iterations (and how much time) does it take for Pagerank to converge?

It depends on the site and the number of links. I based this on a site with 5 pages using the above mentioned PR spreadsheet.

Cain IV
How does this small of a difference in pagerank values after a short iteration between inner pages and the home page cause such a drastic result inthe SERPs for Google?

The difference is not small. PR is logarithmic.

Sweet Cognac
I think that the longer a visitor stays on your site, google sees it has value. If it has value, the site stays in the serps, if it has no value...bye bye

Maybe some day, but I would not agree that click metrics are (a) happening and (b) happening to the degree that they trump basic PageRank observations. Example: If I can&#8217;t remember who the second man on the moon was and I do a search, the valuable site is the one that keeps me on it the least amount of time. From SES&#8217;s &#8220;Vertical Creep&#8221; sessions: Yahoo cited a reason for verticals as to give information about a site on the result page itself. Being on a site for a long time is not always a good thing.

mc4210
So, according to your formula, is PR directly related to the sandbox? That is, sites showing a "normal" PR of 3 and above (index page) should not be considered sandboxed?

My use of PR 3 was arbitrarily chosen as an example. Also, the PageRank that is "showing" is not the real PR.

I do believe PR calculations have a lot to do with a "sandbox" effect, but also note in my post I said &#8220;sans filters&#8221;. I think filters also have to be applied, and the filters AND the PR iterations would define the amount of time that would appear to be in a sandbox.

JudgeJeffries
what do you think would be the effect on the index page of only having one link out to a site map of say 100 pages but every one of those pages linking back to the index page?

My example was simple to make the point. If I were to guess, I&#8217;d think the index page would have about 5 cycles that would drop its PR, then it would bounce up to a higher value.

Howard Wright
an initial PR value of 3.5 doesn't sound like a very big value. Can this really explain any observed surge in rankings?

The value doesn&#8217;t really matter. This is an exercise of illustrating that PageRank takes many iterations to calculate.

Oliver Henniges
I doubt this, because this would mean that google proceeds each iteration-loop every 48 hours or so, since many people reported a duration of several weeks or months for "their" sandbox.

I know I&#8217;ve seen GG posts that state that PR is changing all the time. (If someone can help find a post, that&#8217;d be great).

Kaled

Interesting, however, if I understand the theory correctly, it requires the existence of a designed "sandbox" policy (squash sites until PR stabilises). I believe GG has said that it's an effect not a policy.

I am saying it is an effect of the many iterations it takes for PR to stabilize, not a policy.

MinistryOfTruth
Why would this suddenly occur now? Google has been using PageRank since day one, but the sandbox is quite a recent phenomenon.

It always has taken Google many iterations to stabilize PR, but I think they have introduced several layers to the equation that require more analysis. There are many factors at play, such many millions more Web pages as well as a certainly more complex PR calculation. Also, Google this "Google's servers full - CEO says it's a crisis".

[edited by: tedster at 3:58 am (utc) on May 8, 2006]

Marcia




msg:730546
 4:02 am on May 8, 2006 (gmt 0)

Well, since this thread is on the homepage, I must be missing something or misunderstanding something very major here.

If you analyze a site with 5 pages that all link to each other (the homepage having an initial PageRank of roughly 3.5), what you see in the first iteration of PageRank is that the homepage is PR 3.5, and all other pages are PR .365 – the largest PR gap that will ever exist through multiple iterations in this example.

This homepage PR represents a surge in PR because Google has not yet calculated PR distribution, therefore the homepage has an artificial and temporary inflation of PR (which explains the sudden and transient PR surge and hence SERPs).

In the second iteration, the homepage goes down to PR 1.4 (a drop of over 50%!), and the secondary pages get lifted to .9, explaining the disappearing effect of “new” sites. Dramatic fluctuations continue until about the 12th iteration when the homepage equilibrates at about a lowly 2.2, with other pages at about .7.


OK, so let's say there's a very high PR5 link (or two) to the homepage of a new site. That homepage will be a fairly middle to high PR4 (because of the IBL).

Then, let's say that PR4 homepage links to 4 other (second-level) pages on the site which link to each other as well as back to the homepage. Traditionally, if the homepage was a high PR4 those second level pages one link away from the homepage would also be PR4, though a lower PR4.

If those second level PR4 pages then link to some more pages (third-level, two clicks away from the homepage and linked to each other and back to the second level page linking to them - as well as to the homepage, those third-level pages will be PR3.

Even when the Google Toolbar gave a "guesswork" PR to newer pages, say in inner subdirecories, it was generally one integer down from the next higher site level. So if homepage was PR4, one level in would be PR4, and then the toolbar would give a guesstimate PR of PR3 to a subdirectory two levels in.

The old TBPR guesstimates used to, and the Toolbar itself has alway shown that to be so. And it was perfectly understandable when looking at the inlinks to the homepage (given that there were no IBLs to interior pages).

>>Homepage stayed PR4.
>>Second level pages one click in stayed PR4.
>>Third level pages two clicks in, if there are any down the road, stayed PR3.

That isn't something made up. It's exactly how it's always been, and in fact has been, with a "real" site and always has been, since it first showed TBPR after it first went up last summer, and still is.

Are we now saying that if the homepage on a fully meshed 5 page site (with a possible 3rd level, not fully meshed but theme-pyramid style)links to interior pages the PR of that homepage will go down? Are we saying that those links are "leaking' PR off the homepage and losing PR for the homepage by the links?

that don’t bleed PR

Linking out from a page has never in the past "bled" PR from the page. All it did was increase the number of OBLs on a given page and reduce the amount of PR given to other (usually insite) pages if the added links went off-site.

Are we now saying that outbound links leak PR from a page and lower its PageRank?

Marcia




msg:730547
 4:31 am on May 8, 2006 (gmt 0)

<----- Memory like a elephant. Forgets NOTHING. Can find ANYTHING. :)

Tedster, I think this is what you're looking for:

The untimely death of the Google dance [webmasterworld.com]

Brett Tabke
Dude - the dance is dead.

grant




msg:730548
 4:45 am on May 8, 2006 (gmt 0)

Marcia -- you can't refer to the toolbar when you are looking at this equation. First, the toolbar takes a snapshot in time, we're talking about how PageRank gets calculated through iterations. Second, the toolbar does not show fractions.

[edited by: tedster at 4:52 am (utc) on May 8, 2006]

Marcia




msg:730549
 5:38 am on May 8, 2006 (gmt 0)

>>Marcia -- you can't refer to the toolbar when you are looking at this equation.

But you can look at the toolbar when it's been the same for ages and ages, through multiple "updates" and the PR of pages is consistent with what would be expected, given their IBLs, which haven't changed in all that time.

>>First, the toolbar takes a snapshot in time

Oh, I know that; it always has been a snapshot in time. But when one site has been up (and the same link up) for a full year and the other site has been up for several years, and the two sites in question have been the same all along as well as the TBPR, the snapshots over time have pretty much been of the same picture.

>>Second, the toolbar does not show fractions.

Exactly - which is why I referred to it as "high PR5" and "medium to high PR4." When the IBLs are known, and have had the same PR for a *long* time, it isn't hard to estimate whether pages are at the bottom, middle or top of an integer range, which is all the toolbar shows, a rough integer figure. Something like $5.01 and $5.98 both being shown as over $5 and under $6.

>>we're talking about how PageRank gets calculated through iterations.

That's one of the main things I'm questioning: the number of iterations (even approximately) being taken into consideration, when PR converges according to the theory being presented, and the time element and time span involved over which the iterations are taking place. Those parts are missing from the post with the equation. The part about "iterations" is very unclear to me, so I need it clarified.

I know how it used to be (and/or can dig up posts explaining it very clearly) and how long it used to take. It's different since it's internally continuous, but a comparison is still valuable, if there are those factors figured into this equation.

I believe that the duration of the “sandbox” is the same amount of time it takes Google to iterate through its PageRank calculations

Therefore, I think that the “sandbox” is nothing other than the time it takes Google to iterate through the number of calculations uniquely needed to equilibrate the volume of links for a given site.


The PR calculations, with the number of iterations needed, used to take 2-3 days. Now, the sandbox seems to last about a year, give or take a few months. See where I'm coming from? That's a big difference. So you can see why some folks would need some clarification, given those two timeframes.

And I'm still also questioning the concept of "leaking" and losing PR via OBLs.

Marcia




msg:730550
 6:33 am on May 8, 2006 (gmt 0)

<sidebar>

This thread was a short time before the change-over to the new system of rolling updates rather than monthly index updates with monthly PR recalcs and updates & TB updates. It was posted May of 2003.

I would hold on to the idea of an update that brings in more data for a little while longer. In time, I do think things will be more gradual. However, we're still in the transition period for this system, so I wouldn't be surprised to see a traditional update for a little while longer.

Is Freshbot now Deepbot? [webmasterworld.com]

</sidebar>

This 63 message thread spans 3 pages: < < 63 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved