Welcome to WebmasterWorld Guest from 54.205.251.179

Forum Moderators: open

Message Too Old, No Replies

Stemming

is google implementing stemming?

   
11:53 am on Oct 14, 2003 (gmt 0)

10+ Year Member



Hi!
New here. I'm sure this topic must have been discussed long before here, but may be you people won't mind replying again. Is google implementing stemming or thesaurus for keywords searched for? It's been a topic of debate off late at various forums but no one seems to be knowing for sure. What do you guys think?
12:07 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Welcome to the forum Napoleon. I must say you do know the questions to ask to wake me up in the morning better than coffee.

You may know about Google's relatively new ~ syntax, which allows you to search for synonyms. To get an idea of how much ground it covers, search for blue and then ~blue; you will definitely get different result counts.

As for stemming, I miss it less than I thought I would since full-word wildcards are available...

12:20 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



search for blue and then ~blue; you will definitely get different result counts

Hm, can't see a difference.
And the counter is random number generator since a few weeks, no matter what you search.

since full-word wildcards are available...

Could you explain this a bit further?
12:24 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld, Napoleon!

I see no evidence that Google have changed their policy on stemming [google.com]:

To provide the most accurate results, Google does not use "stemming" or support "wildcard" searches. In other words, Google searches for exactly the words that you enter in the search box. Searching for "book" or "book*" will not yield "books" or "bookstore". If in doubt, try both forms: "airline" and "airlines," for instance.
12:31 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another new google message? "By default, Google searches for variations of your search terms." [webmasterworld.com]
This is at least an indication they are working on something.
12:33 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Blue vs. ~Blue, that's funny. I definitely saw different result counts AND different order of results between the two searches. Try rose and ~rose for a dramatic count difference.

Full-word wildcards: Google doesn't support stemming, where you can stick a * at the end of a word and get variants on that word -- moon* finding moonlight, moondance, mooning, etc. But Google DOES support full-word wildcards, where you can substitute * for a word. For example, searching Google for "three * mice" finds three blind mice, three blue mice, three green mice, etc.

Make sense?

1:07 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



Try rose and ~rose for a dramatic count difference

Or try ~flowers -flowers and see which words are being highlighted

1:09 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Blue vs. ~Blue, that's funny. I definitely saw different result counts AND different order of results between the two searches.

Your not going mad I saw different results to :)

1:16 pm on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, what you see as "full-word wildcards" has nothing to do with wildcards, but happens to have the same end effect. Basically google simply removes characters such as "*" and certain stop words from your query. It still recognises them for proximity ranking though. So
Three * Mice
becomes
Three [any word] Mice

which of course has the desired effect.

But it has nothing to do with any wildcard feature. In fact a search for
Three * Mice
yields the same results as a search for
Three and Mice
or
Three a Mice

Just nitpicking though, because in the end it's the effect that counts.

SN

2:16 pm on Oct 14, 2003 (gmt 0)

10+ Year Member



There are some strange things afoot at google. If you have an adwords account look at the broad-matching keywords and try some searches. (my pet conspiracy theory)
1:59 pm on Oct 15, 2003 (gmt 0)

10+ Year Member



>
Actually, what you see as "full-word wildcards" has nothing to do with wildcards, but happens to have the same end effect. Basically google simply removes characters such as "*" and certain stop words from your query.
>

Hi Killroy,

Google is not just removing them. If Google was just removing them then the searches "three * mice" and "three * * mice" would get the same results, and they don't. There's some kind of placeholding going on, whether you want to call it wildcard or something else.

A search for "three * mice" (note quotes) does NOT give the same results for "three and mice", as Google doesn't recognize many (any? Maybe "the"?) stopwords in phrases.

RBuzz

2:41 pm on Oct 15, 2003 (gmt 0)

10+ Year Member



RBuzz is spot on. I don't know what use it is, but it is fun to play with!

Right, now I am going to do some work.

HP