Forum Moderators: coopster

Message Too Old, No Replies

building a php search engine

how to find similar words?

         

jamie

6:09 pm on Feb 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi,

i am trying to build a simple business search, where the user enters the business name (or nearest guess) and is returned the link to the business in question.

i can do it with mysql's LIKE, but that doesn't help with misspellings (which are surprisingly common). is there any 'magic' MYSQL function to allow a similar search facility like google's "did you mean"?

or is this a real example of rocket science?

many thanks :-)

incrediBILL

6:34 pm on Feb 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I know it's not the answer you seek, but you can use most ecommerce systems to do what you want. For instance, something like osCommerce (ugh) is FREE, off the shelf, already has a pre-built database and a search function. Many people use ecommerce products like this as a general purpose database, just yank the ecommerce features off the page (ie. BUY NOW button) and it works very well. The added advantage is that ecommerce systems already have a hierarchical category system built in so you can offer both an index and a search for no extra hassle.

How I handle the typos which was your question, is I include an extra database (call it SEARCHASSIST) that logs each failed search. The web site owner can review this list and supply an alternative response to those words. I modified the search function so that I do a direct match of the keyword in SEARCHASSIST first, and if I find a match substitute the webmaster's suggestion with whatever the visitor typed. Over a short period of time you get a decent database that corrects the most common typos for your content selection.

Now, here's the fun part, you can redirect product searches too! For instance (an ecommerce example) if your visitor is looking for 'Widget A' and you only sell 'Widget B', you can display "I'm sorry, we don't have Widget A, but here are our suggestions:" and show them what you actually sell that is comparable.

Another alternative is you build a SOUNDEX type of index, where you convert all the keywords into what they might SOUND like instead of their actual spelling. Then you simply search the SOUNDEX index instead of the normal index, bypassing simple spelling errors, but that would get too entailed to describe here. Maybe you can find something off the shelf that would do this, dunno.

Hope that gives you some ideas.

[edited by: incrediBILL at 6:48 pm (utc) on Feb. 11, 2005]

fischermx

6:39 pm on Feb 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




A common approach would be, though, not too elegant, to do first a "like" search and if nothing found that use a "soundex2 function to try a second search on the term.

coopster

6:40 pm on Feb 11, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



See if this is of any help, jamie:
can a php search function find near-misses? [webmasterworld.com]

jamie

8:56 am on Feb 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



incrediBILL,

that's a great bit of thinking outside the box with the near misses - thanks for the suggestions.

coopster, there's loads of good reading there, many thanks for the link.

am off to investigate.

thanks all

dmmh

11:04 am on Feb 13, 2005 (gmt 0)

10+ Year Member



mmm, this might be usefull, thanks :)