| This 34 message thread spans 2 pages: < < 34 ( 1  ) || |
|Supplemental Page Count Formula?|
When trying to get pages to move from the supplemental index to the main index I get the impression that there is a formula that is applied that relates the percentage of pages in the main index to PR. The boundary between pages that are in and out of the main index appear to move back to the same percentage, in my case to about 10% in and 90% out. When I make changes and get pages shifted into the main index, after a few days others disappear and go supplemental and the ratio of in and out shifts back to what it was before - 90% out. I have a travel related site and I have found that many similar sites with similar PR's have similar ratios for what's in and out. I have also noticed that for very large sites - one with 10,000 pages and the other with 300,000 pages there appears to be a limit of 1000 pages in the main index.[ "site: " displays 1000 pages and then the supplementals start].
My experience suggest that there is a formula for what is in and out of the main index related to PR and that there is an upper limit of 1000 or so. It seems that you can only make small changes to this ratio without an extreme ammount of work. Has anyone else found this?
>>>>>Given "this many people" who link to you, we're willing to include "this many" pages in the main index<<<<<
This sounds like a formula to me, something like for a 300 page site :
IBL's....TBPR..in Main Index....in Supplemental....%Supp
The conclusion is that once all the issues have been dealt with, such as duplicate content etc., etc., you run into a brick wall which says: "Sorry you have reached the page number limit for this site with its x number of IBL's, all the rest of the pages will be dumped in the supplemental".
This is what I have observed with my site - if you add pages, change links etc. things appear to change with more sites added to main index as the pages are re-crawled, but then other changes are made, so that the number in the main index remains about the same. The impression I get is that the bot crawls a page - checks it's formula for the nunber that should be in the main index. If the page is in the main index, it checks if it should be there based on the minimum criteria, if OK it leaves it there and updates the cache, if not it goes supplemental. This leaves a vacancy in the main index which is filled with the next site it recrawls that meets the minimum criteria. So what's in and out is not logical but is a legacy of past decisions. Ultimately once you have done all the 'right things' for every page to stop them going supplemental - then give up because you cannot change the formula that says how many pages will be in the main index for your PR and number of IBL's. If you want more pages in the main index you need to lift the PR! The max is 1000 pages for a URL.
Hmmm. Does this also explain why a 175 page site can only get 135 pages in the index? I know of a site with very few incoming links and that is what has happened there.
All of the canonical issues were fixed years ago, and so there is only one Supplemental URL in the whole lot; and that is for a page that has been 404 for several months.
|The conclusion is that once all the issues have been dealt with, such as duplicate content etc., etc., you run into a brick wall which says: "Sorry you have reached the page number limit for this site with its x number of IBL's, all the rest of the pages will be dumped in the supplemental". |
Bear, that's about right, but its not exactly a formula:
1. PageRank iteration. This phase decides how much PageRank is attributed to each url on your site. If your site isn't "important" enough, Google may guess the PageRanks of some of your pages to conserve CPU resources (see the message: "PageRank not yet assigned" in Webmaster Tools?). Google has to recalculate PageRanks of billions of pages on the web on a daily basis, so not calculating PageRanks of all urls in its data centers saves Google time. PageRank calculation isn't straight forward anymore either (Google also has to determine the intention of links to decide how much PageRank each link passes).
2. Once Google calculates/guesses the PageRanks of your urls, it compares them against the minimum PageRank threshold. If a url's PageRank is equal to or greater than the threshold, the url stays in the main index. If a url's PageRank is less than the threshold, the page is stored in the supplemental index (or not indexed at all).
It also seems like you want to believe there's nothing you can do to solve your supplemental problem. I don't buy that either, since I've gotten several sites back into the main index.
Keep in mind too that a site with obviously artificial linking patterns can have IBLs devalued (see the sites cited in recent Forbe article, where a site engaging in excessive reciprocal linking had IBLs devalued and ended up in "Google Hell").
"It also seems like you want to believe there's nothing you can do to solve your supplemental problem."
I've been trying to change things for 6 weeks or so. I've done everything I can see from various posts could be the cause, added unique content to every page, fixed cannonical, fixed minor HTML errors, got more IBL's including deep ones, changed the link structure and reduced the links per page, etc. etc. - all to no avail. Most of what I've done may have been unnecessary? Another problem is the delays in the response and the failure of the "site:" command to present accurate and up-to-date information. I know there are more pages in the main index than the "site:" command shows, because I can see them in a search as not supplemental. The frustrating thing is the delays, not knowing what to fix, when -if ever, the changes will occur; and above all seeing similar pages with identical PR, position in the link stucture, unique content etc. etc. with some in amd many out for no apparent reason. Its a nightmare! Nothing I have done has changed the ratio of what's in and out. More IBL's would appear to be the answer and hoping that all the the changes I have made will eventually change things one day. I don't know whether I've been penalised or this is just the way things are now - there's a fixed number of pages the bot will add to the main index for a given site, with a given number of IBL's, link structure, and PR (the formula!) - no matter what you do. Ah! well back to the IBL's.
| This 34 message thread spans 2 pages: < < 34 ( 1  ) |