Forum Moderators: Robert Charlton & goodroi
Once I make a search for "site:mysite.com" when my Google is set to return 10 results per page, I receive "146" results.
Doing the same search when Google is set to 100 results per page yields only 96 results.
In the corresponding sitemap (which I generated with xml-sitemaps.com) there are 94 pages.
Your thoughts and comments will be appreciated.
And by the way, how do you generate your xml sitemaps?
Thanks!
Assaf
Additionally, at the point of search, Google determines how many "relevant" results there are for your query - and even a slight change in the method used to search can change Google's judgement of what is relevant at what isn't. You might also notice that the results are ordered very different when you view 100 results instead of 10 - it isn't just the first 10 pages put one after the other.
In the list of urls for the direectory queries you can see indexed pages that were not returned for the site:example.com query - and they are indexed. The best way I know of to verify a page as being indexed is to type that url directly into the search box. And even then there can be variation at different data centers.
For the webmaster, this can be a game of just getting a close estimate for the total number, and then focusing on individual pages that are important as needed.
the results are ordered very different when you view 100 results instead of 10 - it isn't just the first 10 pages put one after the other
What I see happening is Google's clustering filter. Using 100 results, there are ten times as many chances to get two urls from the same domain - so a #2 result can be followed by an indented #3 result - which was actually #99 when you used only 10 results per page.
[edited by: tedster at 9:45 pm (utc) on June 18, 2009]
And by the way, how do you generate your xml sitemaps?
I strongly encourage you to pick your sitemap generator from this list at Google Code:
[code.google.com...]
tedster, you wrote:
The best way I know of to verify a page as being indexed is to type that url directly into the search box. And even then there can be variation at different data centers.
I'm not sure I fully understood: searching for www.mysite.com or http://www.mysite.com yields not only the pages indexed, but also many other pages that contain these phrases.
Can you plz elaborate?
Thanks in advance!
Assaf
[edited by: tedster at 6:33 pm (utc) on June 20, 2009]
[edit reason] de-link the example [/edit]
Note that I'm not just talking about the domain name "example.com". It works for any url - even deep internal urls.
You'll notice that at 10 results per page, that the total changes when you click through to page 2 and changes again when you reach the 'last' page.
I always measure by using 100 results per page, and click through to the last page. I measure twice: with and without
&filter=0 appended.
I've seen some unusual variations, including several instances of truncated results, when clicking through to the final page of results and seeing something like
"Results 501 - 522 of about 1,380" with num=100 and filter=0 set
[edited by: Adam_C at 1:36 pm (utc) on June 23, 2009]
[webmasterworld.com...]
I have a site that previously had 200,000 pages listed in Google's index via site:example.com. These pages were in Google's index for around 5 years. Within a couple of weeks they dropped down to 2000 pages and have remained steady. The pages still in the index however are ranking very well.
Were these pages penalized in some way? The pages were basically manuals/products lists for a manufacturer whose products we carry. As such they weren't really ever updated, however were very relevant for someone searching for a particular product.
[edited by: tedster at 7:15 pm (utc) on June 24, 2009]
Last night, I noticed that the site: operator on images.google.com stopped reporting the number of images found on the site. I tried several sites as well as google.co.uk they were not working and still not working till now.
Does any one know if that's a part of an update or a bug ?
[edited by: Robert_Charlton at 6:29 pm (utc) on June 26, 2009]