homepage Welcome to WebmasterWorld Guest from 54.225.1.70
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 34 message thread spans 2 pages: 34 ( [1] 2 > >     
Supplemental Page Count Formula?
bear




msg:3352117
 1:55 am on May 29, 2007 (gmt 0)

When trying to get pages to move from the supplemental index to the main index I get the impression that there is a formula that is applied that relates the percentage of pages in the main index to PR. The boundary between pages that are in and out of the main index appear to move back to the same percentage, in my case to about 10% in and 90% out. When I make changes and get pages shifted into the main index, after a few days others disappear and go supplemental and the ratio of in and out shifts back to what it was before - 90% out. I have a travel related site and I have found that many similar sites with similar PR's have similar ratios for what's in and out. I have also noticed that for very large sites - one with 10,000 pages and the other with 300,000 pages there appears to be a limit of 1000 pages in the main index.[ "site: " displays 1000 pages and then the supplementals start].

My experience suggest that there is a formula for what is in and out of the main index related to PR and that there is an upper limit of 1000 or so. It seems that you can only make small changes to this ratio without an extreme ammount of work. Has anyone else found this?

 

Halfdeck




msg:3352178
 4:27 am on May 29, 2007 (gmt 0)

If a url's PageRank falls too low, it will fall out of the main index. If your site is largely supplemental, it means 1) not enough quality inbound links to your site 2) you have too many pages 3) you link out too much 4) Google may think your IBLs are artificial 5) Cannonical issues are causing PageRank to split.

bouncybunny




msg:3352185
 4:52 am on May 29, 2007 (gmt 0)

I've never heard points 2 + 3 being relevant for pages falling into the supplementals.

tedster




msg:3352191
 5:09 am on May 29, 2007 (gmt 0)

I think you may be seeing a "formula" that applies only to how the site: report is created. I just checked a big site that I worked with in the past.

site:example.com
1,000 results (the limit for any search)
only #1000 is supplemental
total results about 300,000

site:example.com/directory-a
1,000 results
only #1000 is supplemental
total results about 19,000

site:example.com/directory-b
1,000 results
only #1000 is supplemental
total results about 13,000

See what I mean? The first set of results seems to say they've only got 999 results in the main index. But the results in the second and third searches alone seem to indicate 1998 main index results just from those two directories. If I do site: searches for all their directories and add up the number of non-supplementals, it becomes very large.

Halfdeck




msg:3352192
 5:17 am on May 29, 2007 (gmt 0)

"I've never heard points 2 + 3 being relevant for pages falling into the supplementals."

As for 2) think of total PageRank X (sum of all inbound PageRank to your domain) split between Y number of pages. Roughly speaking, bigger page count = lower average PageRank per page (depending on your site structure). We know that a page with PageRank below minimum threshold "goes" supplemental. With excessively high page count, average falls too low, and you'll end up with many pages in the supplemental index. By reducing the number of pages, you slightly increase average PageRank per url. That can result in several supp pages popping back into the main index.

3) A minor point, but if a TBPR 5 page has 100 links and 90% of those are outbound, you are giving 90% of that page's juice to other sites instead of to your internal pages. Add more internal links to that page, lower your outbound percentage, and you have a little more juice to play with.

Both are minor tweaks compared to gaining trusted (non-paid, non-reciprocal) IBLs. No matter what you do, I don't see a domain with a TBPR 2 root getting 1,000 pages in the main index.

McMohan




msg:3352232
 6:55 am on May 29, 2007 (gmt 0)

Isn't Google's policy of putting pages with low pageranks in supplemental flawed?

Doesn't it force a webmaster to go after links rather than create pages that are of value with good content?

Shouldn't pages be evaluated on the uniqueness and useful content they provide rather than how many links they have?

What about those who advocate content is king?

bear




msg:3352290
 8:55 am on May 29, 2007 (gmt 0)

" If a url's PageRank falls too low, it will fall out of the main index. If your site is largely supplemental, it means 1) not enough quality inbound links to your site 2) you have too many pages 3) you link out too much 4) Google may think your IBLs are artificial 5) Cannonical issues are causing PageRank to split."

This is not my experience - I have a PR 4 site with 600 pages. Most used to be in the main index - about a 2 months ago it started to go supplemental and stopped at 10% of pages in main and 90% out. There is no consistent difference between the ones in and out in terms of PR, links etc. I have good IBL's and have been adding links steadily. I have dealt with the cannonical issues and there is no duplication for any of the pages - each of which have 100+ unique words. The only issue remaining is deep linking to the individual pages with IBL's which is hard to do. Looking at similar sites with similar PR and 100+ pages they all seem to have similar ratios of 10% in and 90% out. It is my impression that these ratios are fixed and that its very hard to shift the sites out of the PR related ratio. I agree that sites with PR of 2 will mostly be supplemental, but it appears that even PR 4 and 5 sites will have a large percentage of sites supplemental in spite of all the listed causes being addressed. As I mentioned before I can move some pages into the main index, but shortly thereafter other sites will go supplemental. Is this Google's answer to the billions of websites - only index 50%, 30% or less of the pages for each URL? Is the era of many pages on a site over, despite their value and benefit to users? Is the era a specific information that can be provided on many specific pages that pop up in specific searches now over, due to the spammers?

piatkow




msg:3352415
 12:26 pm on May 29, 2007 (gmt 0)

My own site is showing about a third supplemental. The pages that are out are all record reviews (site is for a music magazine) which generally do not share many keywords with the pages that get the inbound links and are never updated once published. No idea if this is significant, I just noticed it a few minutes ago when this thread prompted me to check how many supplamentals I have.

Halfdeck




msg:3352669
 5:06 pm on May 29, 2007 (gmt 0)

"There is no consistent difference between the ones in and out in terms of PR"

You cannot see the PageRank of individual urls. TBPR 4 is on the low side anyway, so its not surprising to see fluctuations.

There is no correlation between TBPR and % supplemental. If you owned a TBPR 5 with 10 pages, you'll always have 100% of your site in the main index.

As for Google basing indexing on PageRank, it's a flawed concept but I don't think they have a better alternative, since Googlebot doesn't understand what's written on a page.

[edited by: Halfdeck at 5:07 pm (utc) on May 29, 2007]

bear




msg:3352806
 7:19 pm on May 29, 2007 (gmt 0)

"There is no correlation between TBPR and % supplemental."

BUT

"No matter what you do, I don't see a domain with a TBPR 2 root getting 1,000 pages in the main index."

AND

"If you owned a TBPR 5 with 10 pages, you'll always have 100% of your site in the main index."

I don't want to labour the point but I still think that there is a correlation and that the % supplemental is driven by a formula that has terms for Page Rank and number of pages. You can change things a little but in general terms if you have more than 10 pages, or so, the % supplemental will be related to PR. If you have a PR4 site with 100 pages say 80% may be supplemental, if its PR5 say 50% may be supplemental, if its PR2 then 99% may be supplemental. Perhaps its related to diluting the PR between pages, but its hard to shift from the range driven by the formula.

In my case I have a travel site with 600 pages, currently PR4 - one for each city. My strategy was to have my keywords linked with the town name so that searches would find the individual pages in Google results. Obviously having only 20% of the pages indexed makes a mess of this strategy. Most of my competitors are in a similar situation, only a small to moderate percentages of their sites are indexed, and the limit is 1000. If I want 95% in the main index what should I do - work on deep linking to each of the pages? Try to get the PR lifted to 5 or 6? Register 100 URL's and keep the number of pages to 10? Develop my own Travel search engine? Is it possible to have 95% of 100 pages indexed for a PR4 or PR5 site? Can you really control exactly what's in or out? The supplemental index has changed things - perhaps Google is heading towards only indexing the index page for a URL. The times they are a changin!. Its so hard working in the dark without knowing what will work and when.

tedster




msg:3352820
 7:33 pm on May 29, 2007 (gmt 0)

Try to get the PR lifted to 5 or 6?

Even a higher PR4 will help (IOW, don't worship the toolbar integers) - assuming you circulate that PR well through a sane link structure. Deep inbound links will help your neighboring deep pages even more.

bear




msg:3353000
 12:40 am on May 30, 2007 (gmt 0)

I did some research looking at travel sites and the relationship between PR and % of page in the main index. There is a lot of variation and the page numbers vary considerably. Bit it does show that overall the % in the main index is correlated with PR, but that there are exceptional PR 4 sites with high percentage indexed in the main index (max was 94%). For a travel site having anything less than 80% of your site pages(one for each town) in the main index is a disaster in terms of hits. I still think it implies that there is a max inclusion rate for moderate effort within the PR range, something like 40-50% for PR3; 60-70% for PR4; and 90% for PR5.
There's too much variability in the site contents, page number etc., etc. to be conclusive. One thing it does show is that most people have a problem with supplementals. There's hope but its a lot of work to achieve high inclusion rates with >400 pages.

PR Pages Main Main%
3 132 15 11
3 286 127 44
3 39 20 51
3 111 47 42
3 132 16 12
3 56 29 52 Mean 34% Max 52%
----------------------------------------------------------------
4 625 165 26
4 87 30 34
4 138 130 94
4 189 50 26
4 853 680 80
4 230 99 43
4 287 200 70
4 202 130 64
4 447 285 64 Mean 59% (50% without 94) max 94%
----------------------------------------------------------------
5 409 350 86
5 574 230 40
5 711 370 52
5 523 445 85 Mean 66% max 86%
----------------------------------------------------------------
Very large sites max 1000?
6 28400 1000
4 3350 542
4 3070 1000
4 4560 1000

s_clay




msg:3353018
 1:21 am on May 30, 2007 (gmt 0)

I have a 150 page eCommerce site, with about 10-15 pages supplemental.

These pages are supplemental due to duplicate content issues.

steveb




msg:3353025
 1:44 am on May 30, 2007 (gmt 0)

"You can change things a little but in general terms if you have more than 10 pages, or so, the % supplemental will be related to PR."

There isn't any relationship at all. There isn't anything consistent, certainly no formula.

s_clay




msg:3353047
 2:46 am on May 30, 2007 (gmt 0)

Should have noted site is PR5

Halfdeck




msg:3353211
 9:02 am on May 30, 2007 (gmt 0)

Bear, those figures aren't surprising.

Say your TBPR 4 remains constant but you add more and more pages. As you do so, % supplemental number will tend to increase. As you remove pages, % supplemental number will decrease (its like sharing a pizza with less people - each person gets a bigger slice). Similarly, if you pull links to your site and lower the home page TBPR to TBPR 1, % supplemental will increase. If you increase TBPR to 7, % supplemental will decrease (you're sharing your pizza with the same number of people, but you ordered 3 large pizzas instead of one small, so everyone gets a bigger share).

There is no arbitrary formula that says a site with PageRank X can have no more than Y% of its pages in the main index. Suppose for a second that a TBPR 2 is allowed to have 2% of its pages in the main index. A spammer can then force Google to index 40,000 pages by creating a 2,000,000 pages site.

tedster




msg:3353490
 2:00 pm on May 30, 2007 (gmt 0)

One thing to note is that PR is assigned to a URL - and not to a site. So when we say a "site" has PR4 or whatever, we are talking about the PR of the domain root or Home Page.

However, through deep inbound links, it is possible for an internal page to have a higher PR than the domain root. That simple fact is a good reason not to fear publishing new pages that are good and could attract links on their own merit. In other words, there is truth in Halfdeck's analysis -- if you only have links to your home page.

Halfdeck




msg:3353752
 6:26 pm on May 30, 2007 (gmt 0)

I agree completely tedster. In fact my blog directory is TBPR 4 while my root TBPR is 3. When I say a TBPR 2 site I'm using that as a short hand for a site with a total IBL "power" (all inbound PageRanks to a domain added together) that's pretty weak. Googlers often resort to the same shorthand in their Webmaster Guidelines by attributing PageRank to a site, rather than to a url, though technically they're also making a mistake.

The way PageRank flows through a site is unique for every site; How PageRanks flow into a site is also unique for every site (A blog might have multiple entry points, as people link to different posts, while a commercial site with thin product pages might only have one entry point - the home page). There is still a tendency for PageRank to gravitate upwards (e.g. PageRank will gravitate toward blog category/archive pages, as they're often linked to from every page, though again, it depends on your blog setup).

bear




msg:3353854
 8:06 pm on May 30, 2007 (gmt 0)

The conclusion from your comments would appear to be that the reason a page goes supplemental is that the individual TBPR for that page is too low, below a threshold? say <2. This is simple. But if you look at the reasons put forward for a site having a large percentage of pages supplemental they often relate to the site. For example - too many duplicate pages, too many outward links, poor internal linkage structure, etc.

Halfdeck says

“If your site is largely supplemental, it means 1) not enough quality inbound links to your site 2) you have too many pages 3) you link out too much 4) Google may think your IBLs are artificial 5) Cannonical issues are causing PageRank to split.”

Sure you can translate these ‘site issues’ into an impact on the TBPR for each of the pages, but there is still reference to the ‘duplicate filter?’, low trust for the site, internal link structure, and other site wise penalties or issues?

The frustrating thing is that if I look at the pages that in the main index and compare them with ones that have gone supplemental there is no apparent difference between them: They are in a similar position in the hierarchy, they have unique content, they have the same number of unique words, they have been cached recently, etc., etc.,. Perhaps someone has linked to the individual pages that are in the main index, – but I don’t think so. The decision of what’s in and out appears to be arbitrary and if 50 are considered worthy of inclusion in the main index, why not include another 300 that should have essentially the same TBPR in terms of link structure, etc, etc. In my case it’s a travel site and each page for a town has a town description, holiday activities and lists of properties. Each page has unique meta info etc. I can't see what the difference is, and so I've nothing to work on. This was the reason for my impression that there was an arbitrary allocation – why include some but not others, when their features are identical. There's no common feature for the 'in' group, which would identify why they are 'in'. Perhaps it’s a penalty thing – but I’ve addressed all the issues I am aware of that may be causing this – no links to link farms etc. etc. Perhaps it’s a timing issue – maybe I should wait a few months and see what happens. Maybe the delays in response from Google, combined with my fiddling with things, are making it impossible to work out what’s happening. Or else focus on getting in bound links, including deep links to lift PR of the pages above the threshold for inclusion in the main index. It reminds me of trying to catch fish by 'thinking like a fish'- what bait, where, when. The Google system is a similar mystery - 'Thinking like the Googlebot' is another of life's frustrations! Particularly when the Google brain and system is for ever changing - at least the fish brains and thoughts are constant!

schalk




msg:3353908
 9:26 pm on May 30, 2007 (gmt 0)

This is the first supplemental post I have read for a long while that seems to relate exactly to my position.

We have an ecommerce site with about 40k pages. We seem to have 10% pages in the main index and 90% in supplemental. With no means of controlling what is in and what is out. Pages with loads of content can go supplemental as well as those with little.

We only have a home page PR of 4 and realise that we need to build relevant quality inbound links. This is by no means easy, since we are asking ourselves the question why would anyone want to link to us. We are sorting this, with fresh content, but realise this will take along time to get the links. We have also been through and sorted any Canonical issues that were present, although we didn't have much of a problem here.

My thoughts are to try to stabilize things by reducing the number of pages we have, and working out how to spread the PR evenly across the pages we want in the index.

My question is, has anyone had any success in altering their internal linking structure to improve their spread of PR?

I would feel much happier if we could keep the good pages in, whilst we continue to build the inbound links.

bear




msg:3353939
 10:09 pm on May 30, 2007 (gmt 0)

Schalk,
I'm no expert on this but perhaps you could try rel="nofollow" on the links to the unimportant pages - not sure how this will affect the distrubution of PR.

see

[searchenginejournal.com...]

but someone else may have experience with this

Jakpot




msg:3353941
 10:10 pm on May 30, 2007 (gmt 0)

Isn't Google's policy of putting pages with low pageranks in supplemental flawed?

Doesn't it force a webmaster to go after links rather than create pages that are of value with good content?
Shouldn't pages be evaluated on the uniqueness and useful content they provide rather than how many links they have?

What about those who advocate content is king?


Yes
Yes
Yes
They are in supplemental

steveb




msg:3353973
 10:47 pm on May 30, 2007 (gmt 0)

"The decision of what’s in and out appears to be arbitrary'

It is, but only in the shades of it.
You just in general need to get high page rank, more links, and without duplicate content.

Halfdeck




msg:3353990
 11:19 pm on May 30, 2007 (gmt 0)

"The conclusion from your comments would appear to be that the reason a page goes supplemental is that the individual TBPR for that page is too low, below a threshold? say <2"

Not exactly. Assuming all PageRanks of webpages on the Intraweb adds up to 1, average PageRank is miniscule, something like 0.000000100204023934. Two pages (12+ months old) might both display 0 in the toolbar but internal PageRanks for them may be dramaticly different.

bear




msg:3354002
 11:38 pm on May 30, 2007 (gmt 0)

Sure - but what you are saying is that there is a height bar - a threshold of PR that decides whether the page is in or out.

In TBPR terms it might be 0.5 say - if its 0.4 it goes to hell - if its 0.6 it goes to heaven.

Given a lot of issues and exceptions (including all the delays with the toolbar, spidering etc.) what you are saying is that no page with a TBPR above 2 or 3 should be in hell and that the average TBPR in hell, should be well below that in heaven, and that for a given site there should be a clear break between PR values at the boundary.
It may be time to do some research before I go fishing!

bear




msg:3355306
 2:20 am on Jun 1, 2007 (gmt 0)

Hey

Some of my sites are returning from hell.

Its hard to pin down what has succeeded, but the most recent thing I did was add a link to the pages in hell from one of my other sites - there is miniscule PR in this but perhaps just having one or two individual links to each page is enough?

suggy




msg:3355423
 7:00 am on Jun 1, 2007 (gmt 0)

schalk

I am in the same position as you. My aim is to keep the first page in each category in google's main index at least. But, I two have been wrestling with the linking structure.

I also notice, having verified the site as mine, that google's count of my internal links is incorrect and very out of data - so don't expect any quick results!

suggy




msg:3355429
 7:06 am on Jun 1, 2007 (gmt 0)

Also, according to google's link anaylsis in their webmaster tools, links from the supplemental index do not exist. My homepage is missing about 9000 internals!

bear




msg:3355493
 8:28 am on Jun 1, 2007 (gmt 0)

The links numbers (internal and external) are way out of date - mine haven't changed for two months. There have been reports that links are only presented for pages PR 4 and above. The only way to get a reliable list of inbound links is to search for "mysite.com" in a general search. The results are very different to google's via link:. I suspect the posted information via link: is only updated every month - the crawl stats are also out of date, only updated once per month. Google's time frame for external posted stats is monthly, though their internal numbers are up to date. Tne omission of links from supplementals is a real worry, though Google claims that the supplemental hell does not affect PR.

Halfdeck




msg:3360357
 5:55 pm on Jun 6, 2007 (gmt 0)

Andy Beal just posted a short video of Matt Cutts talking about supplemental results at SMX:

If you got 60,000 pages, and you only got "this much" PageRank, and you divide it [...he mumbles], some of them are going to be in the supplemental index. Given "this many people" who link to you, we're willing to include "this many" pages in the main index.

Basically what I posted before.

This 34 message thread spans 2 pages: 34 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved