| 12:58 am on Aug 18, 2005 (gmt 0)|
(Bump) Am I the only one?
| 1:18 am on Aug 18, 2005 (gmt 0)|
I think my largest site has an inflated count of around 3X the actual number of pages indexed. However, no site under 100 pages seems to have an inaccurate count, as far as I can tell.
Google has some trouble with old pages in the index, but that doesn't account for this anomaly.
| 7:48 am on Aug 18, 2005 (gmt 0)|
The site: command counts all URLs associated with the site, which is not the same as all pages of that site, or all indexed URLs. Some examples of URLs which are counted in the site: command:
- URLs temporarily deleted with the URL removal tool
- URLs from other sites doing a 302 hijack of your site (should be fixed by now)
- Obsolete URLs which have still links to them from other sites and which Google visits now and then just to see of they are active
- Links to your site with typos in it i.e. www.yourdomain.com/fiel.html instead of www.yourdomain.com/file.html. At one time I had many copies of my sitemap in the SERPs because I used the sitemap as my 404 page. Except for the original sitemap they now all went supplemental, but Google still counts them.
- URLs that have been marked with "noindex,follow".
Google keeps track of many more URLs of your site, but I don't know if these are counted in the site: result. For example, if you have a 301 redirect from domain.com to www.domain.com, then Google must know that domain.com/file.html exists, but is equivalent to www.domain.com/file.html. So there has to be some database record or field somewhere with information about domain.com/file.html, but I don't know if this one inflates the number in the site: command.
| 1:40 am on Aug 19, 2005 (gmt 0)|
Most inflated sites that I have seen, have been serving both www and non-www but without a redirect. This is duplicate content.
Add a 301 redirect to fix that problem.
| 1:48 am on Aug 19, 2005 (gmt 0)|
The 301 redirect would be the logical thing to get things back in line... However, what happens when Google has grabbed things and has never updated since 2004? If they dont revisit - this means they will never get the 301. Therefore, the stuff stays in the index.
| 6:05 am on Aug 19, 2005 (gmt 0)|
Including also: stuff crawled by the Mozilla Googlebot only. Can verify this on one of my domains.
| 4:58 pm on Aug 21, 2005 (gmt 0)|
My client site also effect from that. It is shown only url filename, no title, description like previous time. When It will recover?
| 7:19 am on Aug 22, 2005 (gmt 0)|
site: is fine for me, but link: is screwed up completely