Welcome to WebmasterWorld Guest from 54.211.17.91

Forum Moderators: mademetop

Message Too Old, No Replies

Advanced Search Operators (site:) Giving Wonky Results

Drastic differences when not using wildcard

   
10:55 pm on Oct 21, 2008 (gmt 0)

5+ Year Member



Hi All!

I'm having a bit of an issue. I'm trying to get a handle on how many of our web pages Google has indexed. I'm using the 'site:domain.com' command to do this. I'm using a collection of queries to get a more accurate page index count, rather than just one generic query that is inaccurate.

But what I've found is that when I use 'site:*.mydomain.com' versus 'site:mydomain.com', the non-wildcard query returns a drastically larger page count (800,000+) than the wildcard search (60,000). Based on our internal figures, the wildcard query is more accurate.

My question though is, why does the non-wildcard query yield a much higher page count in the search results.

Any thoughts?

[edited by: caveman at 11:24 pm (utc) on Oct. 21, 2008]
[edit reason] Removed commercial mention per TOS [/edit]

7:57 am on Oct 27, 2008 (gmt 0)

5+ Year Member



Hello,

Are you sure it gives all the results about your site only,

I tried it with some of my sites and the results contained links from other sites while using the non-wildcard query.

8:17 am on Oct 27, 2008 (gmt 0)

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member



How does it compare with what you get if you use

inurl:example.com

[edited by: Marcia at 8:19 am (utc) on Oct. 27, 2008]

6:17 pm on Oct 27, 2008 (gmt 0)

5+ Year Member



Aha! Figured it out.

The difference is that the non-wildcard query (site:domain.com) will also return results for URLs without a subdomain (domainXYZ.com/filename...). For example...the non wildcard query returned results for "domainXYZ.com/articles.php" "domainsXYZ.com/directorylist.php" as well as "www.domainXYZ.com/articles.php" "www.domainsXYZ.com/directorylist.php".

The wild card query will only return results with a subdomain in the URL.

The reason the number of pages are huge is b/c it will return duplicates (www & non-www version) and other subdomains.