Forum Moderators: open

Message Too Old, No Replies

Google and Stemming

Why does google not allow wildcards?

         

Chndru

7:24 pm on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there a way to enable stemming in google search? I have been using google ever since '98. This seems one of the most basic things someone would like in a google search especially for exotic keywords or abstract concepts.

If not possible, is there any explanation to why so, other than the google FAQ :-
"To provide the most accurate results, Google does not use "stemming" or support "wildcard" searches."

Curiously ;)
Thanks

dmorison

8:42 pm on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hiya,

Basically because Google works by "simply" combining pre-compiled indexes of web pages containing (or in Google's case "relavent to") each word in your query - it doesn't actually search as such, rather it "looks up".

This is why it is so fast - type in "fjgbs" and you get a result back almost immediately.

Two or more words require the pre-compiled indexes to be combined. This is still reasonably fast for a small number of words.

To include wildcard matching, Google would not only have to scan the word list for wildcard matches (in itself not that difficult), but then have to combine the pre-compiled result sets of the many thousands (millions even) of words matching the wildcard, something which would take a massive amount of time and certainly not possible within the time-frame of an interactive query!

Hope this helps!

Chndru

9:10 pm on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,
I am suprised at this explanation. MSN search does stemming (e.g. search on pant leads to pants). Could you explain or this is attributable to the principle differences in the way Inktomi and Google looks at a webpage? Does Inktomi have pre-compiled index? Or am i talking about entirely two different ways of managing the db?

Thanks

jeremy goodrich

9:14 pm on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You are talking about two completely seperate databases, and index methodologies.

The best way to understand Google - is to use the site search [webmasterworld.com] read the forum charter [webmasterworld.com] and read the famous Google papers from stanford about their indexing & technological roots.

Welcome to WebmasterWorld, btw. There is a LOT of information here.

Alternatively, if you are really brave, check out these Google papers by their staff [labs.google.com] which contain tons of information which will give you a more solid understanding of their inner workings, from the scientists themselves.

vitaplease

5:43 am on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



welcome here Chndru,

its a good question, but as the others said its more expensive, but I would not rule it out in the future.

I often go to good old Altavista for searching incomplete partnumbers.
(and they are not extremely slow by allowing this..)

It does make Google an incomplete search experience, I had the same desire/question:
[webmasterworld.com...]

golloween

7:13 am on Jun 18, 2003 (gmt 0)

10+ Year Member



Chndru, in fact Google supports wildcards to some extent. Try something like this (don't forget the quotes):

"big * thing"