homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Wrong number of indexed pages with site: operator
dHex



 
Msg#: 4437039 posted 12:52 pm on Apr 4, 2012 (gmt 0)

I have a small website with 43 pages...it used to be over 100 pages prior to October 13 panda update. I deleted pages that I though they are duplicate and thin and submitted a request through Google webmaster tool for removal from Google index. Google processed my request within few days but the indexed pages count remained the same.

Now when I do a search using the site: operator, I still see 116 pages indexed. But when I reach the end of the SERP, it says the following:


In order to show you the most relevant results, we have omitted some entries very similar to the 43 already displayed.
If you like, you can repeat the search with the omitted results included.


When I rerun the query again with "omitted results included", all I get is links to JavaScript on my site at the end of the SERP.

Do you think I have some problem on my website? Why Google is showing me wrong indexed pages count? Do you think this will affect my ranking?

Thanks!

 

Andy Langton

WebmasterWorld Senior Member andy_langton us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4437039 posted 8:43 pm on Apr 4, 2012 (gmt 0)

The count you see is an estimate, and dependent on the individual site, may be much greater or less than the pages you believe exist.

Much depends on how "relevant" Google believes URLs are, to the particular search query. Site: searches are awkward, since there is no specific keyword relevance at all.

That's why even searches that are essentially identical can return different results:

site:www.webmasterworld.com [google.co.uk] (I see 497,000 results)

site:www.webmasterworld.com inurl:www [google.co.uk] (I see 591,000 results)

So that's 100k URLs from nowhere! The reason the estimated count goes up is that the second query is seen by Google as "deeper" and thus retrieves results from a broader part of Google's databases - including those that contain "low quality" results, which might be very old, errors - even deleted pages that hang around.

For small sites, you tend to see the opposite effect - the numbers are low enough for Google just to retrieve everything, so you get all those "low quality" results included in the count straight away.

Overall, though, you are better relying on what you know you've done, rather than worrying too much about the count. Although Google does have a very long memory! ;)

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4437039 posted 8:49 pm on Apr 4, 2012 (gmt 0)

Click on the "show omitted results" link, and then click through to the last page of those results.

It's much easier if the SERPs are showing 100 results per page.

You'll see days where the figures are all over the place and other days where they are more in alignment. You'll see that changes happen in batches.

Google hides pages from showing in the SERPs but the count remains high for a few days after.

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4437039 posted 8:40 pm on Apr 6, 2012 (gmt 0)

Something to try:
Do your site: search.
Now go to the address bar and edit the string that Google produced to actually do the search.
After one of the "&parameter=" strings add &filter=0. This may show more results.

The duplicate directory filter is quite a telltale!

Here's one reference that documents the myriad parameters Google may use in a serach
https://developers.google.com/custom-search/docs/xml_results

Another with more on filter=
[code.google.com...]

Google search uses two types of automatic filters:

Duplicate Snippet Filter - If multiple documents contain identical titles as well as the same information in their snippets in response to a query, only the most relevant document of that set is displayed in the results.
Duplicate Directory Filter - If there are many results in a single web directory, then only the two most relevant results for that directory are displayed. An output flag indicates that more results are available from that directory.

By default, both of these filters are enabled. You can disable or enable the filters by using the filter parameter settings as shown in the table.

Filter --Filter --------Directory
value ---Duplicate -----Filter
-------- ------------ ---------------
filter=1 Enabled (ON)-- Enabled (ON)
filter=0 Disabled (OFF) Disabled (OFF)
filter=s Disabled (OFF) Enabled (ON)
filter=p Enabled (ON)-- Disabled (OFF)



How well these documents map to what you can do with the address bar? Afraid I have no idea.

dHex



 
Msg#: 4437039 posted 10:40 pm on Apr 6, 2012 (gmt 0)

Thanks for the help guys. I think the whole thing is not worth worrying about after all. I submitted a url request through GTW for JavaScript and swf files that were triggering the message...we have omitted some entries very similar to...

Now the message is gone. But the indexed page's count still shows wrong number, 116. I dont think there is nothing much I can do after this.
I guess there is some problem with their system or something. I was afraid of getting labeled as having duplicate content on my website.

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4437039 posted 8:33 pm on Apr 9, 2012 (gmt 0)

Is there any chance you were using:
[developers.google.com...]
Mod_PageSpeed on the server side to speed up your site?

If you have any kind of link to a .js file Google may index the file. Mod_PageSpeed plays a trick "hiding" javascript from the browser until after the onload event. Google may be picking up links to these files, and not understanding them, adding them to their index.

I added a link once to a robots.txt file as a training aid. Google immediately indexed robots.txt! And with Panda, especially the Oct 13th tweak, this could look like "poor quality content"!

Oh no, Mr. Bill!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved