Google will not really confirm or deny the existence of multiple indexes. Long, long ago, there was an official supplemental index, but they supposedly got rid of it.
When I was still consulting, I saw plenty of clues that there were secondary indexes. And to be honest, the idea of secondary indexes just makes sense from a computing standpoint. But, I don't think there is any official Google opinion on this anymore. But it has been a few years since I consulted, so that may have changed.
There are pages that Google never shows in its indexed results.
There are pages Google will never show in a search with a date range. ( I call these pages, pages without dates ).
There are pages Google will never show when a reading level is selected. ( I call these pages without reading levels )
You can mark a page "noindex" and Google clearly honors this, but, Google still crawls these pages and certainly keeps track of their content. Also these pages may have links that Google will follow even though they are not in Google's primary index. So there probably is a "noindex", index! This is an index I really wonder about!
Google clearly knows about all these pages it excludes from conditional search results. So it is likely Google does have multiple "supplemental" indexes. But certainly now the old "supplemental" results are perhaps hidden or restricted as I mentioned above.
I think that Google is designed to crawl everything it gets fed a link too. The supplemental index I think consists of low searched pages and possibly censured results. It cannot be accessed anymore through normal search but yes I believe it exists.
You may find this recent video from Google's John Mueller (recorded 13th Jan 2014) of interest, see from about 35m 30s onwards.
The question was:
"We believe that one of our sites has had 90% of its pages fall into the supplemental index. Is there a way of confirming this and what pages are there? How do we recover from that?"
The first part of John's answer was:
"We don't have a supplemental index any more in the sense that these pages will be treated differently in the search results. So that's not something you'd need to worry about.
We do have different index tiers, depending on how we categorise your pages, how we need to crawl them, but it's not something you'd see specific changes in the search results."
Well, that was a timely answer from Google, now wasn't it...
Anyhoo, in regards to the OP's question on whether it would affect their ranking. I think that if you are worried about pages being in a secondary index, that would indicate that your pages have more problems than a secondary index.
In today's Panda world, regardless of how or where Google is putting a page, if the page is weak/thin/duplicative/rubbish, it will not compete (at least not without some short lived tricks that anyone who has a legitimate business would not touch with a 10 foot pole).
Panda is kind of like the next generation of what was once the Main and Supplemental indexes.
Sorry I have to disagree.
If it were Thanksgiving;
Google's efforts in the last couple years are equivalent to using a;
chain saw on a dry turkey.
|There'd be stuffing everywhere and no meat to be found |
If you run a 'site:' query, sometimes you'll find that something like this happens...
1) Let's say Google says it has 200 urls for your site, which normally would mean 20 pages of results. But if you look at the pagination at the bottom of the SERP you might see only 6 pages, which would mean that there are 120 urls tops.
2) As you click to the last paginated page, the number of results Google says it has suddenly drops from 200 to 115.
3) Google says that there are more urls to see if you want, and provides a link to show all of them. Sometimes this will then reveal the 200 it originally said it had, sometimes the number will be higher than 115 but still lower than 200.
This has been a feature of the 'site:' query since there was a Supplemental index.
Now John Mu says they have "different index tiers". You say tomato...
He also says that all pages, no matter the tier, are treated equally. I cannot reconcile my own recent observations on one site recently with that statement.
I think this is the prompt:
|Google says that there are more urls to see if you want, and provides a link to show all of them. Sometimes this will then reveal the 200 it originally said it had, sometimes the number will be higher than 115 but still lower than 200. |
I think usually when you get this prompt from a Google "site:" query, Google thinks that some of the content on the site is duplicate. If you canonicalize a site very well, and truly don't have duplicate content, this message should go away.
|In order to show you the most relevant results, we have omitted some entries very similar to the 54 already displayed. |
If you like, you can repeat the search with the omitted results included.
In some cases you can use the site: command on subdirectories or subdomains to localize the duplicate content. (and Webmaster Tools)
And I agree, I think in this case this content was attributed to the "supplemental" directory status.
Can you explain how to do the site command to find those that google considered as duplicate because I am not sure how to do that. I guess that would be the reason why my ranking is still not back to where it was...
I have tried a few things with the site command and it doesn't work.
|Google thinks that some of the content on the site is duplicate |
That's not always what I've seen. Sometimes it IS duplicate content, but it's also been
- PDFs, Word and other attachments/downloads
- Flash and other 'include' type files
And most recently it was a load of pages that had all been returned in the first instance on a different domain, but were then stuck into this 'second indexing tier' when the domain was rebranded.
The fact that the same content appeared to have a lesser index status based on a domain switch is what makes me doubt what John Mu says about these pages not being ranked differently. They were: drastically.
You can use 'site:domain.xtn'
Or even partial urls for wildcard match 'site:domain.xtn/dynamic.xtn?variable='
Try 'inurl:' too.
The important thing is no space in the query.
Let me make sure I got that right
When john muller says : "all pages, no matter the tier, are treated equally"
Does it mean that he doesn't see the duplicate ones and "penalizes" my website ?
If it were this site, your query would be 'site:webmasterworld.com'
I usually query the domain without the 'www' because it'll show both 'www' results and 'non-www' results that way.
Re: John Mueller, as I said above I'm not sure I believe him. If I understand your question correctly, you're asking if Google could be penalising your site based on the content of pages that aren't part of their public index.
I'm really not sure: it could depend on the type of penalty that was issued based on what was on those pages when they WERE in the index. What sort of penalty are you worried about?
If duplicate pages are what you're worried about, it's widely accepted that there is NO penalty for duplicate content. It can cause problems because it can waste resources and your site suffers, but that's not a 'penalty' in the sense that hidden text or sneaky redirects can cause penalties. Certainly I've never heard of or seen any issues caused by duplicate pages that didn't vanish as soon as those pages were cleaned out the index.