Forum Moderators: open

Message Too Old, No Replies

Why Do Stop Words Affect Results?

A very common word "was not included in your search".

         

mifi601

3:50 pm on Dec 21, 2004 (gmt 0)

10+ Year Member



if that is the case, why are the search results different for

"xyz 'common word' abc" and "xyz abc"?

I have been wondering about that for while now!

siteseo

8:17 pm on Dec 21, 2004 (gmt 0)

10+ Year Member



Excellent question. My site name has an & in it, and comes up first with a search for "keyword & keyword" but not for "keyword keyword," even though G states it isn't including the & in the search.

nuevojefe

8:39 pm on Dec 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Proximity

your_store

8:44 pm on Dec 21, 2004 (gmt 0)

10+ Year Member



Word proximity is your answer.

Try this thread Google Say It Ignores Stop Words.. [webmasterworld.com]

Added: Nuevojefe beat me to the punch

mifi601

10:20 pm on Dec 21, 2004 (gmt 0)

10+ Year Member



Thank you guys! I did search but did not find in past threads - I knew it should have been covered!

Now I know :)

nuevojefe

11:54 pm on Dec 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



your_store,

That's because you weren't being lazy and actually contributed a resource to back it up. ;-)

thx BTW.

Critter

8:32 pm on Dec 25, 2004 (gmt 0)

10+ Year Member



Indeed, proximity is the answer. The search words you enter are seperated by the stop word in one search, and not in the other. That makes Google look for the words with one separating word in once case and right next to each other in the other case, respectively.

To prove that Google still ignores the stop word you can enter another stop word in its place. For example, if you entered "search1 for search2" try "search1 to search2" and you'll see the results are the same.

soapystar

9:26 pm on Dec 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



different results for me with different stop words.......

Critter

10:16 pm on Dec 25, 2004 (gmt 0)

10+ Year Member



Your computer's broken. :)

Seriously, you're probably just hitting different data centers for each query.

soapystar

8:32 am on Dec 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Critter......

or maybe Googles broke?..... :)
...actually you're right..this morning i get the same results for different stop words.....

Oliver Henniges

9:47 pm on Dec 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope not too much OT but let me add that stop-words play a significant role in semantic analysis (e.g. google for LSI = Latent semantic indexing or phrases like 'cosine value of lexical similarity') and there are a number of interesting threads on that within webmaserworld.

Although in the basic papers it also says that stopwords are those ignored from calculation, of course they do indeed influence factors like proximity, number of words and so on.

We should also expect that lexical analysis of websites has been drive much farther meanwhile, and that the role of stopwords may have been included in the algos in a manner far beyond mere statistical analysis.

But wherever todays state of research may be located, it should be quite clear that an algorithmic analysis of stopwords reveals excellent means to decide natural language texts from spam. If you contribute a very helpful and interesting page on a given topic to the web, your major keywords are quite likely to come up with all sorts of combinations of stopwords and secondary keywords, which means apart from all differences in the given search queries your interesting page is expected to be top in most cases.

Whoa

4:23 pm on Dec 27, 2004 (gmt 0)

10+ Year Member




I think Google needs to use stop words more intelligently; I'm sure it's just a matter of time. Or maybe they will just come out with a better language - Googlish perhaps.

I will be impressed when my site that lists venture capital firms in Illinois stops showing up when people query Google to find the "capital of Illinois" (it's Springfield by the way).

That "of" in front of Illinois should lead Google not to show my pages, but it doesn't do that yet. Understandably, "Illinois capital" would be tough to discern the true intent of the searcher.

Critter

12:38 am on Dec 28, 2004 (gmt 0)

10+ Year Member



Or a company named "Lotsa" 'Capital of Illinois'

:)