Msg#: 3756609 posted 2:03 pm on Oct 1, 2008 (gmt 0)
I've noticed that the "site:" command on Google produces only 586 results for one of our web sites, where last week it produced 5,060 results. This is a 3 year old, stable web site. That's a 90% drop in indexed pages.
I searched the web for info on Google site count fluctuations and corpus rotations, but didn't find much.
What do you guys think? Is Google rotating corpuses?
Msg#: 3756609 posted 3:03 pm on Oct 1, 2008 (gmt 0)
The first question is which count is closer to the number of urls you intend to have. The site: count is a particularly hard number to get right because of the way that data gets sharded accross many servers during the indexing process. So a big drop such as you are reporting might be just a glitch.
It also might mean that Google untangled a duplicate url canonical url issue for your domain, or that pages were actually dropped, or that... well, it can be a mystery, too, because accuracy in that report is not a top priority at the 'plex. The general user is rarely concerned with it, and the general user is their priority #1.
Msg#: 3756609 posted 5:57 pm on Oct 1, 2008 (gmt 0)
I see the numbers being very badly estimated at times, and some of that is due to the way the estimates are calculated - especially when data is coming from various sources (Supplemental, etc), while some of it hints at underlying problems with the website itself. It is often very difficult to sort out one from the other; sometimes not possible at all.
Even for sites with no major problems I see site counts cycling from 80% to 98% of the true number of URLs - even for sites that have never had any sort of canonical problem, and never had a 404 error.
Msg#: 3756609 posted 7:40 pm on Oct 1, 2008 (gmt 0)
I have been watching one of our sites drop around 50% of what the site: command tells us g has over the last few weeks.
Traffic remains stable and has actually gone up which is the exact opposite of what should happen when you drop 60,000 pages out of the index so imho the site: command is just more fuzz just like the tbpr and link: command.