homepage Welcome to WebmasterWorld Guest from 23.23.8.131
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
so... what 'of' it?
'of' is a very common word and was not included in your search
Fiver

10+ Year Member



 
Msg#: 6120 posted 7:56 pm on Oct 15, 2002 (gmt 0)

Why is it when I search for ... oh I don't know...

<keyword of keyword>

I get a different serp than if I search for

<keyword keyword>

even though google claims: "of" is a very common word and was not included in your search

not included doesn't mean not affective.

it's not a side effect of <'keyword of keyword'> being a commonly used specific phrase, i get different serps every time I include/exclude the word 'of' - on any similar query

[edited by: Marcia at 9:25 pm (utc) on Oct. 15, 2002]
[edit reason] TOS fine point - specifics not really necessary [/edit]

 

deejay

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 6120 posted 8:00 pm on Oct 15, 2002 (gmt 0)

Yep... I think they're called 'stop words' - words that are essentially ignored - of, by, the, etc.

Put your phrase in quotes to search for the phrase including stop word - <"keyword of keyword">

[edited by: Marcia at 9:26 pm (utc) on Oct. 15, 2002]

Slade

10+ Year Member



 
Msg#: 6120 posted 8:07 pm on Oct 15, 2002 (gmt 0)

I think Fiver was questioning this:

Why, if 'of' was not included in his 'keyword1 of keyword2' search, doesn't 'keyword1 keyword2' return the same results?

Try also 'time of day' and 'time day' without the quotes. One yields time.gov, the other time.com (strange, but true).

Fiver

10+ Year Member



 
Msg#: 6120 posted 8:14 pm on Oct 15, 2002 (gmt 0)

you got it slade.

makes it a tad hard to optimize for 'time of day' ... grrr.

deejay

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 6120 posted 8:20 pm on Oct 15, 2002 (gmt 0)

oops.. you're right Slade.. sorry, not thinking straight here.

The 'why' of it would, I think, come down to proximity of the words, and more importantly the pattern of the words indicating their relationship.

I didn't do much English in high school, so bear with me and forgive the terminology.. but here goes:

blue shirt - the subject of the phrase is actually "shirt" and "blue" is just a modifier/delineator.

blue of shirt = the subject or focus of the phrase is "blue" and "shirt" is a modifier/delineator.

Now, if I can get that far with that phrase and one stop word... imagine what a bunch of guys with phds could do. :)

Fiver

10+ Year Member



 
Msg#: 6120 posted 8:43 pm on Oct 15, 2002 (gmt 0)

the 30 phds aren't making a lot of sense to this bSc. right at the moment.

so although 'of' is a very common word, it changes the meaning of a phrase, and therefore should be included in the query... is what google should say instead of "is a common words and was not included"

Slade

10+ Year Member



 
Msg#: 6120 posted 9:07 pm on Oct 15, 2002 (gmt 0)

Did anyone try 'time and day'?

The "AND" operator is unnecessary -- we include all search terms by default.

This returns the same as 'time of day'(without quotes) when I would expect it to return the same as 'time day'.

Marcia

WebmasterWorld Senior Member marcia us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 6120 posted 9:31 pm on Oct 15, 2002 (gmt 0)

This gives a challenge with doing good page titles and sometimes writing the text. It comes down to use of "exact phrases" even though it involves stop words and also comes into play with reversing keyword1 and keyword2. To top it off, there's also a decision of what to put into the link text for internal navigation.

We've had some great discussions that relate, and a search for word order and proximity should bring some of them up. Some of us around here are *really* into this, and ciml has done some very insightful posts.

We've just had a related discussion, just in the last day or so, that also involves the "Fresh" date, which appears for some searches and not others, even though the searches are very close.

TallTroll

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 6120 posted 9:40 pm on Oct 15, 2002 (gmt 0)

Try a comparison of "time day", "time of day" and "time of the day" (no quotes). This shows pretty clearly that although the words "of" and "the" aren't included in the query, they are acting as delimiters and affecting the results

For instance, in the "time day" results, I see greater prominence given to sites with "time/day" or "part-time day school" in the titles, whereas a skim down the SERP for "time of the day" shows lots of sites with the words time and day separated by one or two other words. Fire a side by side comparison in Google, results set to 100, and you'll see what I mean

bird

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 6120 posted 11:19 pm on Oct 15, 2002 (gmt 0)

As TallTroll correctly notes, stopwords act as wildcards. The following should all bring the same results (didn't actually check right now, but that's the expected behaviour):

"time of day"
"time and day"
"time or day"
"time * day"

If you want to include the stop word in your search explicitly, precede it with a plus sign:

"time +of day"

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 6120 posted 10:28 am on Oct 16, 2002 (gmt 0)

> Some of us around here are *really* into this...

That's right Marcia, in most contexts an obsession like that is seen as unhealthy.

Part 2.3 of "The Anatomy of a Large-Scale Hypertextual Web Search Engine" mentions that Google has "location information for all hits and so it makes extensive use of proximity in search".

If you type {keyword1 keyword2} into Google, then it's a better match for {keyword1 keyword2} on the page than it is for {keyword1 in keyword2}, {keyword1 of keyword2}, etc. Also it's a better match than {keyword2 keyword1}. We tested the idea of proximity on a character level in a thread a while ago, but found that it looks much more like a word level match.

Some of us have the feeling that proximity matters a little less since the last update. It's not easy to find examples for quantitative analysis though.

Marcia

WebmasterWorld Senior Member marcia us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 6120 posted 11:02 am on Oct 16, 2002 (gmt 0)

We had a thread where we got into the various combinations with page titles, if I remember correctly, but you've got me here, on this one:

We tested the idea of proximity on a character level in a thread a while ago, but found that it looks much more like a word level match.

What's the difference between word level matches and character level?

And in a page title, what would be the difference between keyword1 keyword2 or keyword1 in keyword1 or keyword1, keyword2 or keyword1 - keyword2?

4eyes

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 6120 posted 11:09 am on Oct 16, 2002 (gmt 0)

'keyword1 5-letterword keyword2' gives the same results as keyword1 10-letterword keyword2'

In other words, it is not taking into account the number of characters in the separation, just the number of words.

Not tested this though

Fiver

10+ Year Member



 
Msg#: 6120 posted 3:30 pm on Oct 16, 2002 (gmt 0)

ok, so are we saying that to optimize for

"keyword stopword keyword" as a searched phrase, we should optimize for "keyword anystopword keyword", but not "keyword anystopword anotherstopword keyword", and not "keyword nonstopword keyword" as it produces a different serp.

"keyword nonstopword keyword" produces something different for each nonstopkeyword i use in fact...

but that's contrary to
'keyword1 5-letterword keyword2' gives the same results as keyword1 10-letterword keyword2'

isn't it? aren't those completely different three term phrases if the 5 or 10 letter words aren't stop words (they couldn't be, at that length(?))

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 6120 posted 6:27 pm on Oct 16, 2002 (gmt 0)

Yep, 4eyes describes what I'm getting at. Fiver, it works fine if 5-letterword and 10-letterword are both stopwords.

"keyword1 keyword2", "keyword1, keyword2" and "keyword1 - keyword2" are equivalent.

"keyword1 in keyword1", "keyword1 and keyword1" and "keyword1 * keyword1" are equivalent.

I'm describing the words typed into the search box. I can't remember if a comma or hyphen in the title reduce the proximity match (I'll have to go test that).

Slade

10+ Year Member



 
Msg#: 6120 posted 6:38 pm on Oct 16, 2002 (gmt 0)

The reason 'keyword1 nonstopword1 keyword2' has a different result than 'keyword1 nonstopword2 keyword2' is that the nonstopwords are effectively more keywords. So:

keyword1 nonstopword1 keyword2 = keyword1 keyword3 keyword2
keyword1 nonstopword2 keyword2 = keyword1 keyword4 keyword2

where

nonstopword1 = keyword3
nonstopword2 = keyword4

Hmm... What I'm trying to say is just that if you use anything other than a stopword or stop-symbol(?) then you're getting the results for a different set of keywords. (Does this make any sense?)

Fiver

10+ Year Member



 
Msg#: 6120 posted 7:02 pm on Oct 16, 2002 (gmt 0)

obviously the reason I'm asking is because of the limited success I've had with both 'keyword stopword keyword' and 'stopword stopword keyphrase'(which often give me similar but not identical search results as the keyphrase alone) and other stopword combos.

I've tried many combinations with no consistent results... often I find myself ranking for the phrase moderately well (low end of the top ten, or on the second page) - but the serp lists my index page and doesnt double it up with my keyword specific pages... as though it's ignoring them and my index is just well related.

oh well, just hoping to find out how to really target those phrases built around stopwords.

JonB

10+ Year Member



 
Msg#: 6120 posted 7:46 pm on Oct 16, 2002 (gmt 0)

slade iam not quite sure if i undertand quite your explanation but this stop owrd "of" - if it is NOT included in the search then why diferent results. there are no MORe keywords since stopword is not incldued in the search right? if only keyword keyword2 goes to search variable then why differnet results. i mean stopwords are striped before laucnihg search in database . when looking at diferent resutls then i guess this is not the case. and if this is not the case and stopword CAN change order of and even some differnet resutls (that ios affect the resutls)then it is not good to "delete" this wod from search by default? or?

Fiver

10+ Year Member



 
Msg#: 6120 posted 8:10 pm on Oct 16, 2002 (gmt 0)

so does this beg the question... what role do linguistics play in googles algo? if they treat all stop words as meaningless (rather, not meaningless, but all of equal meaning) there would seem to be no semantic influence... wouldn't the easiest implementation of something like that look at the difference between the effect of 'or' and 'and' (in english sentences, not as boolean expressions)?

im sure im generalizing too much, google cant 'understand' the sites they index... but i bet they could understand some queries they get every day. is Chomsky one of those phds? ;)

ciml

WebmasterWorld Senior Member ciml us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 6120 posted 11:09 am on Oct 17, 2002 (gmt 0)

JonB:
> ...if only keyword keyword2 goes to search variable then why differnet results

Because even though the stopword isn't matched against the document, the surrounding words are now further apart. The proximity of the words changes.

Note for people researching this: A stopword in one phrase can become a non-stopword in another phrase.

Monkscuba

10+ Year Member



 
Msg#: 6120 posted 12:47 pm on Oct 17, 2002 (gmt 0)

Just thought I'd throw in some example numbers. After reading this thread I went and checked just how different it can be.

Our sites main keywords are "keyword" and "location". The title is "keyword in location". Here's Googies search results for a mixture of searches :

"keyword in location" rank 7
keyword in location rank 52
keyword location rank 38
location keyword rank 52 (coincidence)

Funny that, although the title has the "in", we rank lower if you include it in the search. Naturally, add the ""s and we come out much higher.

Y'all have a nice day

Fiver

10+ Year Member



 
Msg#: 6120 posted 2:28 pm on Oct 17, 2002 (gmt 0)

monkscuba's example describes the problems i've had trying to target stopword phrases.

A stopword in one phrase can become a non-stopword in another phrase.

although examples are not appreciated (i wont be cuckoo's mcmurphy and question that) i was wondering if you've noticed any pattern to when stop words become non-stopwords... you don't simply mean within a quoted query?

is it a certain word combo that changes it? number of words in the query, or the relationship of them?

this may be a complete aside but,
I can search for 'howto do something' and google will ask me if i meant to search for 'how to do something' and then when i do it tells me 'how to' was not included in my query.

kinda reminds me of old command line jokes.. you know
% ^How did the sex change^ operation go?
Modifier failed.
or
% make love
Make: Don't know how to make love. Stop.

anyway.

Lance

10+ Year Member



 
Msg#: 6120 posted 8:51 am on Oct 18, 2002 (gmt 0)

May have missed this point in the discussion - if so, I apologise. It does, I think, explain the oddities.

You know that Google searches on stopwords in the following circumstances: a) if you stick a + sign in front. b) if it's part of a phrase.

Well, it's a phrase if you stick the words in quotes, right? Yes, BUT Google also has what they call implicit phrase searching.

Which means, if you put more than one word in the query box but not using quotes, Google will still try to match the whole lot as a single phrase. These results come first. THEN it matches the words separately, at which point it does not search for the stopword. So it tells you that it has not searched for <stopword> because it hasn't.

But it HAS searched for the phrase, and those results come top, so you do get different results.

Hope this helps.

Lance

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved