Forum Moderators: open

Message Too Old, No Replies

Google Says It Ignores Common Words, But...

... if so, why do the SERPs differ?

         

michael heraghty

2:00 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



I'm not the first person to bring this up, but no-one replied to it the last time:

[webmasterworld.com...]

Yet, I think the subject is important. Here's the issue: Google claims that it ignores common words in searches. It certainly *used* to. But it doesn't seem to anymore.

For example, try a search on "widgets in placename" (without the quotation marks). Now try a search for "widget placename" (again, without the quotation marks).

Google will say: "in" is a very common word and was not included in your search.

If that's the case, then why are the SERPs so different for each of the searches?!

If the common word was ignored, the SERPs should be the same for both. As far as I recall, this used to be the case. Not anymore however.

Is Google misleading searchers?

j4mes

2:17 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



Yes, I've often wondered about that. Searching for +the gives several billion results for an ignored word :)

Hennatron

2:19 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



Could it have something to do with weighting applied to the position of each k/w in the search string?

e.g. position one (widget) = 0.5, position two (in) = 0.3, position three (placename) = 0.2 etc.

So although "in" has been exlcuded from the matching process, its position still had some weighting.

gethan

2:37 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've seen a difference in SERPs for the "widgets in placename" vs. "widgets placename" way before stemming, in fact (I wish I had the stats) I think this has been the case since 2001 - when I started optimising for a "widgets in placename" search phrase. So I don't think this is anything new, but it is something that is often overlooked.

carneddau

2:53 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



I've also wondered about this. The results with and without "in" are really quite different. I've seen examples where sites are on page 5 without "in" and position 1 with. That's quite a big difference considering the word "in" was apparantly not used in the search.

jimbeetle

2:56 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For example, try a search on "widgets in placename" (without the quotation marks). Now try a search for "widget placename" (again, without the quotation marks).

Google will say: "in" is a very common word and was not included in your search.

If that's the case, then why are the SERPs so different for each of the searches?!

I must be missing something here. What you describe seems to be normal Google behaviour.

In searches with very common words, yes, the common words are ignored, but the word pattern is not.

Thus, the search:

widgets in placename

Becomes:

widgets someword placename

Which is not the same as:

widgets placename

Different search phrase, different results, normal Google behaviour.

moishe

3:11 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



in the case of one of my main sitesm my KW's have punctuation and Google definetly indexes that differently, IE;
widget street
widget st
widget st.
widget's street
widgets street
widget's st
widgets st.
All bring different results, makes it darn hard to optimize for

John_Caius

3:12 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well I thought that the "widget someword placename" hypothesis was correct - however, compare the results for

widgets in placename [google.com]

and

widgets of placename [google.com]

- seems to me that the 3rd, 4th and 5th results are swapped in the two searches. Most mysterious...

mattdwells

3:56 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



it's probably a phrasing thing.

'widgets placename' says to boost pages that have widgest right next to placename, but 'widgets in placename' says to boost pages that have widgets within one word of placename.

dougmcc1

4:47 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



I'm seeing slightly different results for "widgets in placename", "widgets of placename", "widgets * placename", and "widgets placename".

Robert Charlton

5:00 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If the "someword" theory fully explained it, "of" and "in" would be interchangeable. I've tried a bunch of real world searches, though, with both "of" and "in". Not only do they give different results, but there's another interesting difference as well... "in" triggers Local results at the top of the serps, whereas "of" doesn't.

I assume that Google doesn't index the stopwords, but it may use some of them in queries. I tried using "to" instead of "in", just in case "in" is special. Got different results yet... but, as with "of", no Local results at the top.

[edited by: Robert_Charlton at 5:01 pm (utc) on June 27, 2004]

BReflection

5:00 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



I don't see this huge different in SERPS, personally.

widgets in placename 233 results
widgets placename 234
widgets of placename 233
widgets from placename 233
widgets where placename 233
widgets why placename 233
widgets on placename 233

I think Hennatron's explanation probably sums it up.

placename widgets in 227
placename widgets 233
placename widgets of 229
widgets of on in from where why how for placename 234
widgets of OR on OR in OR from placename 233
widgets of AND on AND in AND from placename 235
widgets NOT of NOT on NOT in NOT from placename 217

Well, maybe my mind is changed. Here's some evidence which you can gather using the boolean NOT. If Google isn't searching for these words anyway why should this make a difference?

microsoft 102,000,000
microsoft NOT of NOT on NOT in NOT from 10,200,000
microsoft -of -on -in -from 5,450,000
microsoft +of +on +in +from 10,700,000

This is starting to look really funny. I thought NOT and - had the same effect? What's going on here?

paybacksa

5:08 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



perhaps SERPs are a combination of exact matching, broad matching (without common words), and phrase matching (perhaps with common words used as placehlders)?

I have always had success optimizing for "Widgets in Cityname" separately from "Widgets Cityname" and separate still from "Cityname Widgets". I don't ever remember having success for all three variants with one page.

jimbeetle

5:23 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"in" triggers Local results at the top of the serps

That could be a good explanation for many of the discrepancies.

creative craig

6:09 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Could it have something to do with weighting applied to the position of each k/w in the search string?

There should be no difference, they should return the same results.

Namaste

6:26 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have see different results as far as I can remember...but I think the results are just different and don't include the "stop words"

ciml

6:52 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hennatron has it, proximity is in the original Backrub papers.
[webmasterworld.com...]
[webmasterworld.com...]

I'm intrigued by Robert Charlton's suggestion of localisation for "in".

Adam_C

7:19 pm on Jun 27, 2004 (gmt 0)

10+ Year Member



I noticed this a couple of weeks ago...

it used to be that

widgets in placename

widgets to placename

and

widgets * placename

would all give the same results. Now we're seeing differences. Good to see.

steveb

9:40 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Somewhat different, but maybe related observation...

For literally years I've been trying to get my "how to perform widgeting" page to rank for a how to perform widgeting search (the four words, no quotes). Unfortunately google has always ignored "how to", and thus ranked a stronger (but less relevant) page higher, based only on the "perform widgeting" part of the search.

Several days before the recent backlink update my "how to" page started outranking the more generic page for that "how to" search. Google still says it is ignoring "how to" (and not highlighting it in the ransom notes) but perhaps there is some common word change being implemented or experimented with.

<The "in" thing is a fascinating idea.>

goodroi

11:10 pm on Jun 27, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Good find, very interesting.

As a side note the term "in" appears to be only part of the local formula. For local results to appear, "in" and a geographic term need to be part the search phrase. I've also seen local results without including "in" and sometimes I've seen no local results with just the geographic term and not including "in".

GrantNZ

3:29 am on Jun 28, 2004 (gmt 0)

10+ Year Member



Here's what I noticed the other day, though it (Google) may have been doing it this way for ages...

Google says: "The following words are very common and were not included in your search: where is"

I'm seeing a different result, even when Google says they ignored the words, compared to me leaving the 'where is' out.

edit_g

3:39 am on Jun 28, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've had the same problem as steveb - trying to rank for something that includes stopwords. This above has been the case for at least the past 6 months.

Do a search for:
'Something is' -
[google.com...]

Google says that it takes out 'is' because it is a common word. So the results should be the same as a search for 'something':

[google.com...]

But they're completly different. So I was stuck for a while not knowing which phrase to focus on (the above are examples only), in the end I gave up trying to figure Google out and optimised for both... :)

webnewton

6:31 am on Jun 28, 2004 (gmt 0)

10+ Year Member



Is Google misleading searchers?

Can't say anthing about this. But yes you're right Micheal Google indeed do consider common words ex
resuts would be diffrent with

"widgets of usa"
"widgets usa"
althought google would say that it ignores "of"
This is nothing new either. I've been noticing it for almost a year now.

michael heraghty

10:40 am on Jun 28, 2004 (gmt 0)

10+ Year Member



Thanks for the good insights and some interesting theories guys.

So Google ignores the stopwords, but not their effects on proximity? (Has this always been the case? I'm interested that, like me, many people have only noticed this within the last year or so -- maybe since Florida?)


Michael, stopwords like "in" and "the" are not used to filter the results (at least not when they're treated as stopwords) but the proximity of the terms in the search is counted.
hotel in mars, mars hotel and hotel mars
0........9..., 0....5.... and 0.....5...

The proximity from hotel to mars might be indicated as +9, -5, and +5. If the text matches the query then it's a much better match. For phrases where the top listings are quite close in other respects, this proximity can affect rankings considerably.

Ciml, I'm quoting your message from one of the threads you linked to, as I think it's a neat explanation of how "proximity" works.

And yes, some initial searches I've done produce identical results when two-letter stopwords are interchanged (widgets in placename = widgets of placename), while different but, again, consistent SERPs appear for three-letter stopwords (widgets for placename = widgets the placename).

Robert -- interesting point about the "in" producing local searches, and being different to "of", which would indicate an anomaly. I'm not seeing it in my own tests, but it may be worthy of further investigation...

Leosghost

10:56 am on Jun 28, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And it does even weirder things in French where to be grammatically correct you have to use le , la , l' , de , du , de la , en etc etc ..this makes Seo really weird as google say that it ignores them ( and other words ) and then when all other "on page" considerations are equal..

Goes sometimes with the "good grammar" version as higher placed in serps ..and sometimes not ( seems to depend on the number of letters in a word ..some common words have 4 or 5 letters and in some cases with phrases the words can include punctuation symbols making them longer )!

I know this isn't down to "offpage" cos I have run the experiments myself using my own pages and "optimising" separately for each scenario pages which are otherwise identical linkwise etc ...

Question is is this filtering behaviour geo specific or language specific ..any one else seen evidence in non English serps?

Wail

11:15 am on Jun 28, 2004 (gmt 0)

10+ Year Member



Then there are hyphens. Oh, damn their eyes.

A search for: a state
Returns 266,000,000 results and "a" is a very common word. You'll not see a bold a in the first page of snippets. It's ignored.

A search for: a-state
Returns 6,940,00 results. There's no mention fo common words. In the snippets you'll see "a state" in bold. That's "a" space "state". You get seperate dictionary links to "a" and "state".

In the a-state search you're getting "a" as a seperate word and counting it - but only, it seems, if it's right next to "state".

Perhaps Google should say, "a is a conditionally common word"?

Hmm. Contextually common word?

Leosghost

11:17 am on Jun 28, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"depends on how we feel about it ..today... word" ..?

troels nybo nielsen

12:32 pm on Jun 28, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just did a search in Danish for the historically highly questionable but grammatically fully correct Danish phrase "Anders And drak the" (without quotation marks).

I do not hope for Google that they deliver many results like this. If you were to search "Donald Duck drank tea" in English would you be satisfied with a SERP that did not count "Duck" and "tea"? And would you like having an explanation that showed complete ignorance of the English language?

At the present quality level Google's common-word-filter in Danish is a nuisance.

Robert Charlton

5:29 pm on Jun 28, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



...interesting point about the "in" producing local searches, and being different to "of", which would indicate an anomaly. I'm not seeing it in my own tests, but it may be worthy of further investigation...

I hope I can post this without violating TOS... it's one of those generic type searches that is nothing more than an example...

hotels in Omaha
hotels of Omaha
hotels to Omaha

The same thing happens with other city names. Not sure what might happen with other widgets.

MichaelBluejay

3:47 am on Jun 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If Google is filtering out certain words (big IF, I know), shouldn't they be called "filter words" instead of "stop words"? Doesn't seem helpful for us to use terms that have a completely different meaning than what we're trying to describe.
This 34 message thread spans 2 pages: 34