Forum Moderators: open

Message Too Old, No Replies

The Propogation Process of Backlinks and Pagerank

The "New Google" has created a fishbowl for us to study

         

steveb

11:26 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The beginning of the month update of the Google Directory has given us an excellent opportunity to observe on a large scale the Google process of crawling, assigning pagerank, and transfering that rank to other sites.

Since the Google Directory wasn't updated for eight months, there are now literally thousands of new categories with white or grey bars. At the same time, many of the sites listed in these new categories are getting the benefit (sometimes quite significant) of having a link from the Google Directory for the first time.

But that link won't instantly make a difference. First Google must crawl these new pages -- in some cases this means crawling a greybar for the first time; in other caes this mean re-crawling a whitebar category that shows in its current cache a custom page not found error rather than the category. Second, Google must assign pagerank to these new categories, and show the backlinks *to* them. Third, Google has to show the Directory link as a link to the sites listed in that category. Fourth, Google must assign the correctly weighted PR transfer from the category to the listed sites. Fifth, the listed sites' index pages get the PR benfit themselves, with whatever benefits that might entail in terms of crawling and serps. Sixth, the index pages of the listed sites distribute the PR into the rest of their domains, with again whatever benefit there is in terms of getting crawled and serps.

How long will this take? How long will each step take? Will the process be faster for the whitebar categories or for the greybar ones?

Find yourself some of these new (what will probably be) PR6, PR5 and PR4 categories and observe their evolution, and the evolution of their influence. See the new Google at work.

plasma

12:22 am on Nov 8, 2003 (gmt 0)

10+ Year Member



Great insight, thx.

Kirby

6:01 am on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How can I find these new catagories without looking one by one?

Marcia

7:58 am on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Steve:
Sixth, the index pages of the listed sites distribute the PR into the rest of their domains, with again whatever benefit there is in terms of getting crawled and serps.

It's exactly this, the time lag between the index page getting indexed and showing PR to the time the process happens on interior pages, that a lot of people are getting concerning over, both members and some people's clients.

How long will this take? How long will each step take? Will the process be faster for the whitebar categories or for the greybar ones?

Not even looking at Google Directory categories, even the recently added ones, I've seen a difference in what happens to interior pages between different "average" sites with just about the same inbound links - close enough for our purposes, anyway.

It's not only inbound links conferring PR to the index page which is then distributed to other pages on the site we can look at, but (and I hope I explain this right) - the links between pages within the interior of the site to each other are also computed and factored in - and back to the homepage so it's kind of a a round-robin. And it doesn't necessarily happen all at the same time.

Couple of easy examples using small sites that are fairly new so it's easy to watch:

Site one is PR5; the interior pages were PR4. Until last week, when when though there were no new inbound links all of a sudden most of the interior pages jumped up to PR5 except for two that are PR4. Why? there are less internal links to those 2 within the site. The PR5 on the interior is probably low PR5 and the PR4 are probably high PR4, so they're close to the borderline with rounding off the PR. There are a couple less interior links pointing to the PR4 than to the PR5. That all was not computed at the same time, it was incremental.

Site two has been up for a while with one lousy little PR5 link to it. It's been PR4, and the interior pages are PR3.

Site three is only up for a couple of months and should be called boo-boo.com because a robots meta exclusion was accidentally left in. When that came off it got crawled (just the homepage) went to PR2 then PR4 when the several links were credited, but interior pages weren't crawled which was Booboo #2 - the DW template put the meta exclusion on the other pages. That came off and the site has not only that PR5 link that the Site 2 has, but a few more to boot.

Whoops, the interior pages of Site three got crawled and factored in later than site 2, so with even more inbound links than site two, it's PR4 on the index page but still showing PR2 on the interior pages. The PR propogation between pages on the site and back to the homepage and circulating back in again have not yet been computed. They more than likely will be during this coming month.

This time factor is of interest to a good number of concerned members who are really worrying unnecessarily, and is also of interest to clients we may have who need to be spoon-fed on the basics and don't have a clue as to what they can expect.

So with that said, the more information we can collectively come up with the better, not to fool or trick search engines, but to ease the people's minds who we have some responsibility or concern for - and for ourselves as well.

steveb

8:31 am on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One interesting phenomenon I'm seeing is something that will defintely be observable with the Directory pages on a large scale, and it relates to the white bar versus the grey bar.

It is a common thing to throw up a page that isn't quite "done", just to get Google starting to crawl it and get the process moving faster, get the page in the index as soon as possible. It seems to me now that this actually *slows* the process in terms of links going off that new page, and for the content on that page. Googlebot hits it once, then ignores it for quite some time because it is already "in" the index.

In the Google Directory now, for categories that do in fact exist with sites listed in them, there are a lot of these with whitebar custom 404/this-category-doesn't-exist as their cache. I'm suspecting that these may take *longer* to be crawled a second time, so those sites that should be credited with being listed there will not be for some time.

In contrast, Google now does a good job of getting to new pages quickly, so those with a greybar (never crawled before) now will get crawled quickly, and the sites listed there will start to get credit for those links sooner than the whitebar ones.

To oversimplify it, suppose that it was a fact that Google would crawl a page within two days, but not crawl this page again for three more weeks. Clearly then you would want to wait a week (or whatever) to put up a page until it (and the pages linked below it) are fully "done".

Marcia

10:40 am on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We probably have about 99 people a week wanting to know how and when Google now updates and worrying needlessly about why some of their pages are PR0 when there's really no need for them to be concerned.

It used to be fairly simple to predict or even approximate the update cycle. There would be a big crawl once a month and a few weeks later there would be a big update with backlinks tallied and rankings shifted.

Even with the Fresh crawling it was fairly predictable - which is why we had threads explaining Everflux which, incidentally, were linked to by some of the major blog sites out there. There were many questions being asked by many people even back then, just as there are now.

No so any more, the regularity and simplicity. People have been having a lot of difficulty with understanding Google's timing and updating for the past several months. It'll likely take some observation over time to discern any regular patterns.

ciml

3:51 pm on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> [...] It seems to me now that this actually *slows* the process in terms of links going off that new page, and for the content on that page. Googlebot hits it once, then ignores it for quite some time because it is already "in" the index.

An important point Steve, and one I'm not sure I've read in public. It also brings me to disagree with part of your question.

> But that link won't instantly make a difference. First Google must crawl these new pages -- in some cases this means crawling a greybar for the first time; in other caes this mean re-crawling a whitebar category that shows in its current cache a custom page not found error rather than the category.

Not quite. I agree with you for new pages, but since the freshdeepbot or whatever we call it, Google have been calculating PageRank using URLs last fetched three months ago or more (several PR updates).

Google can now collect a small sample of pages from a large and deep site, seemingly at random, and also keep the data for the other pages from a previous crawl if they haven't re-fetched them. According to my work, when PageRank is run it is run over all those pages - even those that don't appear in the SERPs.

Presumably, Google save significant bandwidth on cralwing deep, unchanging sites without having to ignore recently uncrawled pages, or their PageRank.

Also, with freshdeepbot the benefits of getting the new links are likely to be reflected in the SERPS before they're reflected in the link: search or the Toolbar. This is an important factor for reverse engineering.

steveb

4:43 am on Nov 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It doesn't matter much when/how they recalculate pagerank if they do not know a link exists to a certain page. The point is, if you add a link to PageB from PageA, and Google doesn't crawl PageA then Page B won't get the benefit, nor will the pages linked from PageB. Google will continue to assign the pagerank value from PageA, but it will be done incorrectly.

If Google doesn't know a page links to another page, it obviously can't accurately assign pagerank to that linked page.

Right now there are a lot of Google Directory pages, for example, that are passing pagerank, but instead of passing it to the sites listed on the public pages, they are passing it to whatever higher level Directory page shows in the "not found" error message visible in the cache.

ciml

1:23 pm on Nov 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry Steve, I was thinking of page B as being indexed already, but getting a new link from Page C. As Google would already have the link from B to C, page B wouldn't need to be crawled again for C to benefit from A -> B -> C. This is one of the Google changes I was thinking of.

> How long will this take? How long will each step take? Will the process be faster for the whitebar categories or for the greybar ones?

For grey bar pages, now linked from a parent category already in Google with decent PR, I'd expect the ranking to be affected by the link shortly after the parent is crawled. Then one or two link updates later, the link: search and Toolbar PR should reflect the link.

For white bar pages, now linked from a parent category already in Google with decent PR, I'd expect the ranking normally to be affected by the link shortly after the whitebarred category is crawled (as the parent with decent PR is probably crawled first). As you point out, this is likely to take longer for whitebar pages now that Googlebot crawls some pages quicker than others. As with the grey bar cat's, we might expect the link: search and Toolbar PR should reflect the link in one or two link updates.

mil2k

2:46 pm on Nov 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Deep Crawling is a mystery these days. As far as Pagerank, and assigning of it to inner pages goes, my observations are (for a new site):-

1) Google hits the index page of a site and goes away. If there are more links pointing to the site then it hits again and picks up a few of first level links. The pages get cached.

If during this time the PR and links update occurs then the home page will get PR but the first level pages most probably will not get a PR.

2) after some time Googlebot comes again and hits all the first level pages and they are cached. The second level pages are next in queue but they are also randomly crawled.

If the PR and links update occurs then the first level pages will most probably get PR. Not so sure about the second level pages.

Hence in general the PR gets assigned to one level per crawling cycle. The deep crawl generally occurs once a month for the sites and their results are observed within a week in the SERPS.

The PR and links update are generally occuring at the same frequency as the earlier "google dances" i.e. approximately 28 days apart. They are not relevant in changing the SERPS.

Googlebot is hitting the index pages very frequently (if the PR is right) but generally the inner pages are not getting updated so quickly. There are many exceptions to this (as reported by many WW members).

Would like to hear others experiences and analysis. :)

michael heraghty

3:58 pm on Nov 9, 2003 (gmt 0)

10+ Year Member



People have been having a lot of difficulty with understanding Google's timing and updating for the past several months. It'll likely take some observation over time to discern any regular patterns.

That's assuming Google's timing and updating will ever again settle into regular patterns. Google may deliberately choose (or already have chosen) to maintain a certain unpredictability in its actions.