Forum Moderators: open
Since the Google Directory wasn't updated for eight months, there are now literally thousands of new categories with white or grey bars. At the same time, many of the sites listed in these new categories are getting the benefit (sometimes quite significant) of having a link from the Google Directory for the first time.
But that link won't instantly make a difference. First Google must crawl these new pages -- in some cases this means crawling a greybar for the first time; in other caes this mean re-crawling a whitebar category that shows in its current cache a custom page not found error rather than the category. Second, Google must assign pagerank to these new categories, and show the backlinks *to* them. Third, Google has to show the Directory link as a link to the sites listed in that category. Fourth, Google must assign the correctly weighted PR transfer from the category to the listed sites. Fifth, the listed sites' index pages get the PR benfit themselves, with whatever benefits that might entail in terms of crawling and serps. Sixth, the index pages of the listed sites distribute the PR into the rest of their domains, with again whatever benefit there is in terms of getting crawled and serps.
How long will this take? How long will each step take? Will the process be faster for the whitebar categories or for the greybar ones?
Find yourself some of these new (what will probably be) PR6, PR5 and PR4 categories and observe their evolution, and the evolution of their influence. See the new Google at work.
Sixth, the index pages of the listed sites distribute the PR into the rest of their domains, with again whatever benefit there is in terms of getting crawled and serps.
It's exactly this, the time lag between the index page getting indexed and showing PR to the time the process happens on interior pages, that a lot of people are getting concerning over, both members and some people's clients.
How long will this take? How long will each step take? Will the process be faster for the whitebar categories or for the greybar ones?
Not even looking at Google Directory categories, even the recently added ones, I've seen a difference in what happens to interior pages between different "average" sites with just about the same inbound links - close enough for our purposes, anyway.
It's not only inbound links conferring PR to the index page which is then distributed to other pages on the site we can look at, but (and I hope I explain this right) - the links between pages within the interior of the site to each other are also computed and factored in - and back to the homepage so it's kind of a a round-robin. And it doesn't necessarily happen all at the same time.
Couple of easy examples using small sites that are fairly new so it's easy to watch:
Site one is PR5; the interior pages were PR4. Until last week, when when though there were no new inbound links all of a sudden most of the interior pages jumped up to PR5 except for two that are PR4. Why? there are less internal links to those 2 within the site. The PR5 on the interior is probably low PR5 and the PR4 are probably high PR4, so they're close to the borderline with rounding off the PR. There are a couple less interior links pointing to the PR4 than to the PR5. That all was not computed at the same time, it was incremental.
Site two has been up for a while with one lousy little PR5 link to it. It's been PR4, and the interior pages are PR3.
Site three is only up for a couple of months and should be called boo-boo.com because a robots meta exclusion was accidentally left in. When that came off it got crawled (just the homepage) went to PR2 then PR4 when the several links were credited, but interior pages weren't crawled which was Booboo #2 - the DW template put the meta exclusion on the other pages. That came off and the site has not only that PR5 link that the Site 2 has, but a few more to boot.
Whoops, the interior pages of Site three got crawled and factored in later than site 2, so with even more inbound links than site two, it's PR4 on the index page but still showing PR2 on the interior pages. The PR propogation between pages on the site and back to the homepage and circulating back in again have not yet been computed. They more than likely will be during this coming month.
This time factor is of interest to a good number of concerned members who are really worrying unnecessarily, and is also of interest to clients we may have who need to be spoon-fed on the basics and don't have a clue as to what they can expect.
So with that said, the more information we can collectively come up with the better, not to fool or trick search engines, but to ease the people's minds who we have some responsibility or concern for - and for ourselves as well.
It is a common thing to throw up a page that isn't quite "done", just to get Google starting to crawl it and get the process moving faster, get the page in the index as soon as possible. It seems to me now that this actually *slows* the process in terms of links going off that new page, and for the content on that page. Googlebot hits it once, then ignores it for quite some time because it is already "in" the index.
In the Google Directory now, for categories that do in fact exist with sites listed in them, there are a lot of these with whitebar custom 404/this-category-doesn't-exist as their cache. I'm suspecting that these may take *longer* to be crawled a second time, so those sites that should be credited with being listed there will not be for some time.
In contrast, Google now does a good job of getting to new pages quickly, so those with a greybar (never crawled before) now will get crawled quickly, and the sites listed there will start to get credit for those links sooner than the whitebar ones.
To oversimplify it, suppose that it was a fact that Google would crawl a page within two days, but not crawl this page again for three more weeks. Clearly then you would want to wait a week (or whatever) to put up a page until it (and the pages linked below it) are fully "done".
It used to be fairly simple to predict or even approximate the update cycle. There would be a big crawl once a month and a few weeks later there would be a big update with backlinks tallied and rankings shifted.
Even with the Fresh crawling it was fairly predictable - which is why we had threads explaining Everflux which, incidentally, were linked to by some of the major blog sites out there. There were many questions being asked by many people even back then, just as there are now.
No so any more, the regularity and simplicity. People have been having a lot of difficulty with understanding Google's timing and updating for the past several months. It'll likely take some observation over time to discern any regular patterns.
An important point Steve, and one I'm not sure I've read in public. It also brings me to disagree with part of your question.
> But that link won't instantly make a difference. First Google must crawl these new pages -- in some cases this means crawling a greybar for the first time; in other caes this mean re-crawling a whitebar category that shows in its current cache a custom page not found error rather than the category.
Not quite. I agree with you for new pages, but since the freshdeepbot or whatever we call it, Google have been calculating PageRank using URLs last fetched three months ago or more (several PR updates).
Google can now collect a small sample of pages from a large and deep site, seemingly at random, and also keep the data for the other pages from a previous crawl if they haven't re-fetched them. According to my work, when PageRank is run it is run over all those pages - even those that don't appear in the SERPs.
Presumably, Google save significant bandwidth on cralwing deep, unchanging sites without having to ignore recently uncrawled pages, or their PageRank.
Also, with freshdeepbot the benefits of getting the new links are likely to be reflected in the SERPS before they're reflected in the link: search or the Toolbar. This is an important factor for reverse engineering.
If Google doesn't know a page links to another page, it obviously can't accurately assign pagerank to that linked page.
Right now there are a lot of Google Directory pages, for example, that are passing pagerank, but instead of passing it to the sites listed on the public pages, they are passing it to whatever higher level Directory page shows in the "not found" error message visible in the cache.
> How long will this take? How long will each step take? Will the process be faster for the whitebar categories or for the greybar ones?
For grey bar pages, now linked from a parent category already in Google with decent PR, I'd expect the ranking to be affected by the link shortly after the parent is crawled. Then one or two link updates later, the link: search and Toolbar PR should reflect the link.
For white bar pages, now linked from a parent category already in Google with decent PR, I'd expect the ranking normally to be affected by the link shortly after the whitebarred category is crawled (as the parent with decent PR is probably crawled first). As you point out, this is likely to take longer for whitebar pages now that Googlebot crawls some pages quicker than others. As with the grey bar cat's, we might expect the link: search and Toolbar PR should reflect the link in one or two link updates.
1) Google hits the index page of a site and goes away. If there are more links pointing to the site then it hits again and picks up a few of first level links. The pages get cached.
If during this time the PR and links update occurs then the home page will get PR but the first level pages most probably will not get a PR.
2) after some time Googlebot comes again and hits all the first level pages and they are cached. The second level pages are next in queue but they are also randomly crawled.
If the PR and links update occurs then the first level pages will most probably get PR. Not so sure about the second level pages.
Hence in general the PR gets assigned to one level per crawling cycle. The deep crawl generally occurs once a month for the sites and their results are observed within a week in the SERPS.
The PR and links update are generally occuring at the same frequency as the earlier "google dances" i.e. approximately 28 days apart. They are not relevant in changing the SERPS.
Googlebot is hitting the index pages very frequently (if the PR is right) but generally the inner pages are not getting updated so quickly. There are many exceptions to this (as reported by many WW members).
Would like to hear others experiences and analysis. :)
People have been having a lot of difficulty with understanding Google's timing and updating for the past several months. It'll likely take some observation over time to discern any regular patterns.
That's assuming Google's timing and updating will ever again settle into regular patterns. Google may deliberately choose (or already have chosen) to maintain a certain unpredictability in its actions.