1script - 7:31 pm on Apr 26, 2010 (gmt 0)
I see it most with sites that have incomplete sitemap.xml. About 60% of the time, creating a complete sitemap pulls the URLs out of what I suspect is a sub-supplemental index.
Would you elaborate on what you call an "incomplete" sitemap.xml? I am very curious if your experience has to do with pagination of content pages and inclusion (or not) of pages other than #1 in the sitemap.xml
Also, I don't suppose you are talking about including all 100% of possible URLs that Gbot can come across on a site - tag pages, category pages, navigational pages etc, leading up to content but not the content itself - include all that in the sitemap.xml?
As far as site:operator, I find that site:example.com example.com brings the amount close to what's actually might have been indexed. I do have sites that don't do well in Google lately - mostly long tail issues - and those sites show the biggest difference between just site:example.com and site:example.com example.com , so you may be onto something with a (good old or brand new) supplemental index.