|Implications of Google Ignoring Minus Operator on Site Queries?|
I find that when combining the site and minus operator, certain parts of certain pages appear to be ignored, and I wonder if this has anything to do with Google selectively indexing, searching and/or ranking content. Consider the query
site:webmasterworld.com -Welcome. The word "Welcome" appears in the header on practically every page here. Many of the pages returned in these SERPs are what you might call 'thin' or expect to not be visited very often, if at all (on top of those tens of thousands of pages of printer-friendly content, one page for every single post -- why isn't this blocked via robots.txt?).
One of my sites isn't doing too well in Google at the moment. When I do a site query on the domain and use the minus operator to exclude a word from the footer, about 70% of its pages are returned. That's a significant number. On another site, which is doing better, a similar query returns zero results.
Boilerplate bits of pages such as "Welcome" here can, I suppose, be safely ignored, and probably shrink the index a little bit, but what might the selection of these ignored elements, and the pages returned for a query such as the above, tell us about rankings post-Panda?
Without getting into the specifics of SEO work here, some of which is not even know to me, I agree that convention Boolean search parameters are being given more and more chaotic support at Google over recent times - and that can be frustrating.
I suppose we can take refuge in the fact that very few "average" search users even think about using a minus operator - but the main implication is a less precise tool for webmaster analysis.
It's not so much frustrating as it is confusing. Why is it that mostly thin pages are ignored by a query such as the one I suggested above, and what might this tell us about results for "average" queries? I suppose what I was suggesting, or considering, is that if others see a similar pattern in SERPs for such queries with their sites, as I have now, could this perhaps be a way to uncover what Google considers to be thin content?
I've checked a few sites, doing the same query with the minus operator and with just the keyword of interest - the two seem always add to roughly the total given by just the site: query, and for me the minus operator results alone do seemv to be in the ballpark.
So you may be noticing a bug that does tell you something about your particular site. Do those two queries seem to add up for you?
[site:example.com -keyword] plus [site:example.com keyword] - [site:example.com]
I'd also suggest checking the same with "www".
They nearly add up, there's a 5-page difference (no change with or without www).
I guess you could reverse the query like that, searching [site:example.com keyword] instead of [site:example.com -keyword]. It's not the number of results I'm concerned with, though, it's the type of results, the type of pages returned. Whereas I suspected that [site:example.com -keyword] returned pages that Google might consider thin, reversing that query as [site:example.com keyword] actually seems to bring up most pages that have richer content (more text, mostly).
It's important to note that the keyword in this instance is a word, or actually a year number (2012), found only in the footer of all of this site's pages. The point, then, is that while "2012" appears on every single page throughout this site, a [site:example.com -2012] query still returns about 65% of all indexed pages. The reverse query, [site:example.com 2012] returns approximately the other 35%. One characteristic of that 35% is that they tend to be pages with "fuller" content, more text or higher up in the site's architecture, and the 65% consists mostly of pages that are thinner (relatively). Just like a [site:webmasterworld.com -Welcome] query returns lots of thinner pages from this forum, despite them having the word "Welcome" in the same spot (unfortunately you can't reverse that query; too many posts and topics with welcomes).
I guess it's not so much about Google ignoring the minus operator, but certain (repetitive) words or sections of pages specifically. The question remains: why is it that some pages are included with a [site:example.com -2012] query, but excluded with a [site:example.com 2012] query, when they all have "2012" on the page?