Forum Moderators: open
I was wonering if anybody here has the insight if Google uses some sort of phonetic lookup to reduce the index size, or if itdoes strict word matching only, plus spellchecker (I'd implemented aspell for my system) for speed and efficiency?
I guess with a database as large as googles you can rely on pure exacty matching, since you'll have eenough results for even the most obscure misspellings?
Anybody got any insights?
SN
For misspellings, I imagine they have another db that gets queried for the 'did you mean...' part of the page.
That way, they get real data, real fast - as well as a 'suggested alternative' if it seems likely (based on historical data) that the searcher intended something else.
What you said about 'fuzzy' is great stuff though - amazing how much an obtuse form of math can improve even the most mundane, routine tasks?
Fuzzy thinking gets you there quicker, but as far as I know, you can successfully implement similar using probabilistic methods. :) Which you can tell was the thinking of the original Google designers.
However, the current Stanford research - uses many concepts / mathematical techniques that are 'fuzzy' so, perhaps they will lean more towards this over time?
The big improvement is really the speed - a fuzzy system can cope better with unkowns & dynamic elements.
With the kind of trafic that google has, this should be peanuts, and truly valid, not jstu experimental.
i.e. if on a search for keyword1 60% go ti page 2 and another 30% go further, and only 10% stay on page 1, then the page onje results are not very valid, or at the very least not very usefull. Time to rethink the theming/ranking for those pages under that keyword.
This way you should be able to create a continually improving system, which is unspammable, since there is always a gazillion more real users then a single webmaster who wants to spam. theoretically you should get a google where the "Do you feel lucky" button actualyl works, or better even, the homepage has a link to teh page you were looking for, before you type in any search at all.... or at least so goes the theory of statistical analysis ;)
SN
Using a fuzzy method, the system develops it's own coefficients, and doesn't need human input for those metrics, thus achieving efficiency much faster.
Continually monitoring, iterating, testing, and quantifying is so much work - even for an automated system.
If you have a more organic architecture -> the system responds faster to change.
It's why, for example, in many manufacturing facilities they use fuzzy controllers in stead of expert systems, same results, but less processing power / computational time to reach the end goal.