I'm having a bit of an issue. I'm trying to get a handle on how many of our web pages Google has indexed. I'm using the 'site:domain.com' command to do this. I'm using a collection of queries to get a more accurate page index count, rather than just one generic query that is inaccurate.
But what I've found is that when I use 'site:*.mydomain.com' versus 'site:mydomain.com', the non-wildcard query returns a drastically larger page count (800,000+) than the wildcard search (60,000). Based on our internal figures, the wildcard query is more accurate.
My question though is, why does the non-wildcard query yield a much higher page count in the search results.
[edited by: caveman at 11:24 pm (utc) on Oct. 21, 2008] [edit reason] Removed commercial mention per TOS [/edit]
The difference is that the non-wildcard query (site:domain.com) will also return results for URLs without a subdomain (domainXYZ.com/filename...). For example...the non wildcard query returned results for "domainXYZ.com/articles.php" "domainsXYZ.com/directorylist.php" as well as "www.domainXYZ.com/articles.php" "www.domainsXYZ.com/directorylist.php".
The wild card query will only return results with a subdomain in the URL.
The reason the number of pages are huge is b/c it will return duplicates (www & non-www version) and other subdomains.