Forum Moderators: Robert Charlton & goodroi
In WMT I see 950 URLs listed for one site. The site: search lists between 260 to 320 depending on the day.
It certainly doesn't give much away like it used to. The website is a bit more than six months old.
[webmasterworld.com...]
While there were always some oddities in the site: results, the current situation is quite frustrating to many webmasters. Some who depend on the site: operator to understand how deeply Google is indexing their site are becoming concerned that they now have some kind of penalty, or at least a technical problem with their website or server.
Is this change is an artifact of the new Caffeine infrastructure? That is, will the site: results eventually become more accurate again? Or is this a new and intentional situation, a limit on the site: operator something like Google has always done with the link: operator.
In past years it often happened that Google would make back end changes to upgrade their core search results and various special operator reports would be disrupted for a short period. So, I currently lean toward the idea of an unintented Caffeine side effect.
But these newly uninformative site: results have now been with us for many months and in the last few weeks the distortion seems to be intensifying. It is heartening that Webmaster Tools reports higher numbers in many cases - but does this mean Google won't be showing accurate numbers to anyone but those verified as responsible for the website?
The site: operator seems intended to be used in combination with a keyword - and sometimes that does seems to improve the results. For example, one site I've been working with for fourteen years currentl shows:
site:example.com - 329 results
site:example.com keyword - 816 results
In the absence of any official word from Google, we can only guess what's happening. I'm hoping that it's a temporary disruption, but I wonder how others see this.
But it's affecting all types of sites, including one of a kind major corporate sites that are not showing duplicated or scraped results in Google.
I see it most with sites that have incomplete sitemap.xml. About 60% of the time, creating a complete sitemap pulls the URLs out of what I suspect is a sub-supplemental index.
They may have no sitemap.xml, a sitemap.xml that only has "important" pages or a sitemap.xml that only has higher level pages and they are relying on the bots to find deeper pages from there e.g. an ecom site that has sub-cat pages listed but not product pages because links to all product pages are on the sitemap listed sub-cat pages.
I'm also seeing recently changed pages appearing in SERPs for keywords that do not appear in the cached version, not sure if that's related.
Links are a part... but the squatty part. It is, and always has been, the content.
Google keeps old copies of pages going back months, maybe as long as two years, and the URLs will continue to appear in SERPs for any words currently or previously on those pages
Your search - cache:c9OPXTNBRoUJ:www.example.com/new-url site:example.com - did not match any documents.
The cache still shows the old version of the page,
That is not just my opinion, it's a statement based on my experience and observation.
The page has been changed and new words have been added that were not there before. The cache still shows the old version of the page, but the page shows up for the new words.
There is a "cached" link below the result. However, if you click the "cached" link, you get this message