|Wrong number of indexed pages with site: operator|
I have a small website with 43 pages...it used to be over 100 pages prior to October 13 panda update. I deleted pages that I though they are duplicate and thin and submitted a request through Google webmaster tool for removal from Google index. Google processed my request within few days but the indexed pages count remained the same.
Now when I do a search using the site: operator, I still see 116 pages indexed. But when I reach the end of the SERP, it says the following:
In order to show you the most relevant results, we have omitted some entries very similar to the 43 already displayed.
If you like, you can repeat the search with the omitted results included.
Do you think I have some problem on my website? Why Google is showing me wrong indexed pages count? Do you think this will affect my ranking?
The count you see is an estimate, and dependent on the individual site, may be much greater or less than the pages you believe exist.
Much depends on how "relevant" Google believes URLs are, to the particular search query. Site: searches are awkward, since there is no specific keyword relevance at all.
That's why even searches that are essentially identical can return different results:
site:www.webmasterworld.com [google.co.uk] (I see 497,000 results)
site:www.webmasterworld.com inurl:www [google.co.uk] (I see 591,000 results)
So that's 100k URLs from nowhere! The reason the estimated count goes up is that the second query is seen by Google as "deeper" and thus retrieves results from a broader part of Google's databases - including those that contain "low quality" results, which might be very old, errors - even deleted pages that hang around.
For small sites, you tend to see the opposite effect - the numbers are low enough for Google just to retrieve everything, so you get all those "low quality" results included in the count straight away.
Overall, though, you are better relying on what you know you've done, rather than worrying too much about the count. Although Google does have a very long memory! ;)
Click on the "show omitted results" link, and then click through to the last page of those results.
It's much easier if the SERPs are showing 100 results per page.
You'll see days where the figures are all over the place and other days where they are more in alignment. You'll see that changes happen in batches.
Google hides pages from showing in the SERPs but the count remains high for a few days after.
Something to try:
Do your site: search.
Now go to the address bar and edit the string that Google produced to actually do the search.
After one of the "¶meter=" strings add &filter=0. This may show more results.
The duplicate directory filter is quite a telltale!
Here's one reference that documents the myriad parameters Google may use in a serach
Another with more on filter=
|Google search uses two types of automatic filters: |
Duplicate Snippet Filter - If multiple documents contain identical titles as well as the same information in their snippets in response to a query, only the most relevant document of that set is displayed in the results.
Duplicate Directory Filter - If there are many results in a single web directory, then only the two most relevant results for that directory are displayed. An output flag indicates that more results are available from that directory.
By default, both of these filters are enabled. You can disable or enable the filters by using the filter parameter settings as shown in the table.
Filter --Filter --------Directory
value ---Duplicate -----Filter
-------- ------------ ---------------
filter=1 Enabled (ON)-- Enabled (ON)
filter=0 Disabled (OFF) Disabled (OFF)
filter=s Disabled (OFF) Enabled (ON)
filter=p Enabled (ON)-- Disabled (OFF)
How well these documents map to what you can do with the address bar? Afraid I have no idea.
Now the message is gone. But the indexed page's count still shows wrong number, 116. I dont think there is nothing much I can do after this.
I guess there is some problem with their system or something. I was afraid of getting labeled as having duplicate content on my website.
Is there any chance you were using:
Mod_PageSpeed on the server side to speed up your site?
I added a link once to a robots.txt file as a training aid. Google immediately indexed robots.txt! And with Panda, especially the Oct 13th tweak, this could look like "poor quality content"!
Oh no, Mr. Bill!