Why are pages with separated words preferred over exact phrase matches?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why are pages with separated words preferred over exact phrase matches?

glengara

9:13 pm on Jul 30, 2007 (gmt 0)

I type in "blue fussy widgets", why does G reckon a page that has those words separately on a page be more relevant to one that has it as a phrase?

glengara

10:37 pm on Jul 30, 2007 (gmt 0)

Come on guys, if you do edit the title at least make it grammatically correct, you make me sound like Borat :-)

Syzygy

11:48 pm on Jul 30, 2007 (gmt 0)

Hi Borat :-)

Interesting that you point this out; I've experienced similar frustration of recent.

I wonder, probably naively, if users are now sufficiently discerning/sophisticated in their use of common operators when searching that Google feels it worthwhile or necessary to return results accordingly...

Syzygy

BaseVinyl

12:08 am on Jul 31, 2007 (gmt 0)

Yeah Borat! Why indeed? It doesn't make sense to me that when I search for "widget wadget wonget" that I get a page that has those three words, often sprinkled in the page, but with no connection...shouldn't the page with the combination closest to my query be listed in the top few pages?

I'm with Borat!

Robert Charlton

12:51 am on Jul 31, 2007 (gmt 0)

glengara - I'm the guy who edited your title, and I'm now not sure what the original title was. I may just have added a question mark, since there was no statement in your post... just a rhetorical question that is making an assumption that I don't think is always true.

(I know adding the question mark doesn't make it a question, but it has made you Borat for a while.)

In general, I've found that Google likes exact matches on the page... just not too many of them... but there are several hundred other variables. I can imagine various off-page/on-page scenarios that might cause a page containing the three-words separated on the page to rank higher... probably less likely to happen as the three word phrase is more competitive and purposefully targeted by others.

Please paint a fuller picture of what you have in mind, and we can work on the title as the question becomes clearer. ;)

[edited by: Robert_Charlton at 1:14 am (utc) on July 31, 2007]

whitenight

1:33 am on Jul 31, 2007 (gmt 0)

i don't know about the "hundred other factors".
Really just boils down to 1 or 2 in my mind. Mainly PR.

About a year or so ago(hmm what happened then), you could put in "fussy yellow blue widgets" and come up with the 4 - 6 pages directly addressing the subject.

Of course, they were low PR as how many people are actually linking to those types of pages.

Nowadays you find those pages in the supplemental or only with quotes.
--- and you dare not type "widgets yellow fussy blue" and expect to find it.

Now, higher PR or Trusted pages get credit for having those 3-4 words anywhere on the page.

Assuming Goog did this on purpose, one can only guess that most users still don't search with more than 2 words or it doesn't mind the users clicking on the adwords to find a similar result rather than digging to page 60.

glengara

8:59 am on Jul 31, 2007 (gmt 0)

I've seen pages returned with "blue" and "fussy" somewhere on them and "widgets" only in links pointing to the page with no "blue fussy widgets" content.

I suppose a low "exact phrase" match could be a counter-spam measure, I did see a recent comment by Matt C that they'd lifted it when a topical phrase brought bad results, just seems it's normally set a bit too low.

potentialgeek

9:08 am on Jul 31, 2007 (gmt 0)

Nowadays I can feel like I have to use 'almost' but never quite 'perfect' keyword phrases in page titles because of Google's paranoia or increased hypersensivity to e-vil optimizers. As a perfectionist by personality, I find this rather annoying.

p/g

Robert Charlton

5:31 pm on Jul 31, 2007 (gmt 0)

There's an old joke about becoming a member of the "Apathy Club"... if you show the slightest interest in joining, you're automatically disqualified. Optimizing a page can be a little like submitting an application to that club.

This is more true on Google than on the other engines. (You might say that MSN, eg, still doesn't get the Apathy Club joke).

I suppose a low "exact phrase" match could be a counter-spam measure, I did see a recent comment by Matt C that they'd lifted it when a topical phrase brought bad results, just seems it's normally set a bit too low.

The longer the phrase, I feel, the more unnatural it is for it to occur very many times as an exact match on the same page. And some of us still don't like to use a keyword more than once in a title. But I have found that an occasional exact match on the page can be very helpful.

I'll try to un-Boratize the title. Please let me know if I get it right.

tedster

5:07 am on Aug 5, 2007 (gmt 0)

Google was granted a patent [patft.uspto.gov] on July 24 that may play in here. In particular, exact phrase matches may also need to be identified as a semantic unit before the page where they appear can be "preferred" over documents that have only separated occurances.

Many thanks to Bill Slawski for his insightful commentary on this Google patent [seobythesea.com]. His article is worthwhile reading for those who are following this kind of thing.

There are other related Google technologies in this area. For example, see our earlier thread about the six phrase based indexing patents [webmasterworld.com]. But one key take-away is to recognize that all search "phrases" are not created equal. Google continues to work at understanding when groups of query words make up a true semantic unit, and when they are just multiple query words.

From the patent's "Description" section:
Assume that a user enters the search terms "baldur's gate download." The user intends for this query to return web pages that are relevant to the user's intention of downloading the computer game called "baldur's gate." Although "baldur's gate" includes two words, the two words together form a single semantically meaningful unit. If the search engine is able to recognize "baldur's gate" as a single semantic unit, called a compound herein, the search engine is more likely to return the web pages desired by the user.

Now this patent was applied for seven years ago, and there's no reason to assume it's in use today, exactly as described. But it can help us appreciate what Google considers important, and at least one methodology they have considered seriously enough to apply for a patent.

HarryM

11:51 am on Aug 5, 2007 (gmt 0)

Thanks Tedster, you've clarified something that has been puzzling me in relation to supplemental pages.

One of my pages relates to a perfectly valid search term FOO BAR ROO (say). Once-upon-a-time it used to be #1 in serps for a search on FOO BAR ROO, and used to get traffic.

However the page is now supplemental. Now the same search shows a Wikipedia page on that subject as #1, followed by a mish mash of unrelated pages with the keywords on the page, and then my page well down in the serps.

But searching in quotes "FOO BAR ROO" brings up my page as #1.

As the page is technically no different from my other pages (links, format, likely PR, etc.), I suspect that one of the reasons it is supplemental is that Google does not recognize FOO BAR ROO as a semantic term, and sees it as a poor performer compared to other FOO or BAR or ROO pages.

It seems therefore that this page is probably never going to get out of supplemental status. And as only a very small proportion of searchers use (or even know about) sophisticated search methods, the page will hardly ever get any traffic.

Across all my sites about 50% of pages are supplemental. So if 'semantic unit' technology is one of the reasons, it gives me a major headache.

theBear

2:06 pm on Aug 6, 2007 (gmt 0)

HarryM,

You could make that page Google semantic compliant insert a word or two between foo bar roo ;-) .

Oliver Henniges

9:29 pm on Aug 6, 2007 (gmt 0)

I have always been wondering, why identification of such logical units is only discussed in terms of semantics and statistical analysis of word frequencies and word chains.

I mean: google also offers this translation tool. I never used it very much, but I think it gives quite reasonable results, which means google's software is capable of identifying structures far deeper:

"Syntactic structures" (Chomsky, 1957) are so fundamental to (computer-) grammars, and offer a lot of analytical potentials: I'd suspect many of the examples of OP's type are easily explained by e.g. some of the basic rules or constraints in transformational grammar [en.wikipedia.org]. However, the devil lies in the details here, and it is hard to discuss such assumptions without violating the TOS.

Yet, I nowhere found any hints google would rely on any such syntactical analysis in its ranking algos, but I'd really be surprised if this was so.

glengara

12:26 pm on Aug 7, 2007 (gmt 0)

Some good reading there Tedster, but that MC post I mentioned seemed very much to give the impression that an "exact match" dial had been tweaked to improve results.

I'd have thought exact match could give more relevant results so they must have a good reason for keeping it dialed low.

Robert Charlton

9:31 pm on Aug 7, 2007 (gmt 0)

...that MC post I mentioned seemed very much to give the impression that an "exact match" dial had been tweaked to improve results.

glengara - I've never seen Matt to be quite that specific about algo factors. Can you find the reference?

glengara

9:39 am on Aug 8, 2007 (gmt 0)

You're right RC, it seemed very much a throw-away remark addressing some tangential point, probably in the comments section of one of his blogposts, unfortunately I noticed it without noting it :-(

glengara

10:52 am on Aug 8, 2007 (gmt 0)

Well I tried trawling through the comments of some of the likely posts without any luck, can't believe your man has to read through all that........information.