Forum Moderators: open

Message Too Old, No Replies

Excluded words not excluded completely

apparently they do change the search result

         

seindal

1:12 pm on Dec 19, 2002 (gmt 0)

10+ Year Member



I have noted that if one search for something that contains a common stopword, compared to the search without the stopword, the results are not the same.

Compare search for "arch constantine" with "arch of constantine". I don't get exactly the same result.

The serps are almost the same, but the search without 'of' has more results!

Hence the message

"of" is a very common word and was not included in your search.

is not 100% correct.

Maybe the stopword is excluded in the search in the index, but not in some post-processing done on the raw results from the index. Only guessing, though.

René

Susanne

2:50 pm on Dec 19, 2002 (gmt 0)

10+ Year Member



You're right! I've just noticed the same here, and sometimes the difference is actually quite big.

cornwall

3:01 pm on Dec 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey, you right there.

I tried it with a search for -

hotels "city name"

and

hotels in "city name"

And the results are marked different. I always thought tyhat the "in" would be discounted, but it is obviously not

Macguru

3:06 pm on Dec 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I suggest you try dots, hyphens or underscores to replace every character of the filtered word. I has someting to do with word proximity.

<added> Oops, wont work with dots, try another filtered word like "la" or "de"</added>

[edited by: Macguru at 3:09 pm (utc) on Dec. 19, 2002]

jimbeetle

3:09 pm on Dec 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And then if you put quotes around it the first page results again change very slightly, yours and the geocities site reverse order.

Safe to assume that "normal" search behaviour would be to include the "of"? Probably without quotes? Either way looks like you're in pretty good shape.

Jim

seindal

3:21 pm on Dec 19, 2002 (gmt 0)

10+ Year Member



My guess is that the searches are done in several phases. First a raw result set is obtained from the index, and then the specifics of the query is used to adjust the result set according to word proximity, presense of search words in title and uri, freshness etc.

It might be the raw result set from the index is simply ordered after pagerank, and other criteria are inserted into the result in secondary phases.

Just guessing, though, as (almost) everybody else.

René.

creative craig

3:28 pm on Dec 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have this exact problem with two of the sites that I run.

The word it and the term IT (as in information technology) causes me no end of problems when searching for the home page of either of the sites.

Craig

ann

12:24 pm on Dec 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow, there is a BIG discrepency in the results for some of my key phrases!

Thank goodness the normal person with include the so called stop words simply because they think them.

Ann