Forum Moderators: coopster

Message Too Old, No Replies

can a php search function find near-misses?

ex: can a search for ferniture find furniture?

         

louponne

2:21 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've built a simple search function on a site that looks in a couple of fields of a mysql table for matches to the word(s) that a user enters into a form. At this point, I'm looking for matches, within those fields, to the string that's entered. Pretty simple stuff.

Now, Mr client is asking whether we can produce near misses. I mean if someone searches for "Mecromedia" can we give them "Macromedia" as a result. The French on-line yellow pages does exactly what I mean - if you search for "Claude Martin" in a certain town and there's no "Claude Martin" there, it will show you results with names whose spellings are close to that.

My first reaction is that this must be horribly complex to program, but I'd rather hear opinions from more experienced php coders?

caspita

2:44 pm on Dec 17, 2003 (gmt 0)

10+ Year Member



As so far I know php DOES NOT provide this kind function.

The way that it works in most of the software I know is that you need to create like a "near missed sinonymus dictionary" meaning for each valid word like "Macromedia" you need to feed the near misses "Micromedia, Mecromedia, Macromidea, and so and so, including the word in self as a near missed also, for this example Macromedia" then your first step is look into the table using the 'near misses' column to find out the 'real word' and then you can begin your real search ... it is a kind of easy way to implement a searcher ... of course you don't know at the very begining what the user is going to type, then you need to capture all of the searches that the users type and store those words in the same table (near missed field) and have a periodical revision of the table looking for empty 'real words' that you need to feed.

ie:

your original table has:

near missed real word
=========== ===========
Micromedia Macormedia
Macromidea Macromedia
Macromedia Macromedia (<== rememmber alway to insert th real word also)

the user types 'Makromedia' which you don have, then you insert

Makromedia <blank field>

this last insert you need to update it later (the table will grow very fast so you may need to avoid some low priority words according to your bussines, and also avoid thigs like 'of', 'the', 'a' .. etc)

and every search you go first for the 'near missed' field to find out the 'real word'

You have to assume that the user alway type the correct word so if you don find it in the table you use the user input to looks for the search.

Hope this help

bcolflesh

2:50 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



levenshtein()
us4.php.net/manual/en/function.levenshtein.php

metaphone()
us4.php.net/manual/en/function.metaphone.php

similar_text()
us4.php.net/manual/en/function.similar-text.php

can be used to create the kind of code you are after.

bcolflesh

2:57 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here a page from a larger tutorial that may help you:

codewalkers.com/tutorials/46/26.html

coopster

3:04 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



There is also a SOUNDEX [mysql.com] function in MySQL that you may consider -- I'm not sure if it fits here or not, but you could test it. The SOUNDS LIKE [mysql.com] function is listed right there as well, but that is only available in MySQL Version 4.1 or later.

See PHP's soundex [php.net] function too.

jetboy_70

3:12 pm on Dec 17, 2003 (gmt 0)

10+ Year Member



Even MySQL's LIKE, REGEXP and AGAINST might be of some use. Creative use of these would be a very easy way to create pseudo-loose matching.

louponne

3:55 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wow, thanks everyone!

I'll check out all those suggestions and see if I understand. :)

RichD

4:17 pm on Dec 17, 2003 (gmt 0)

10+ Year Member



This is a PHP/MySQL query I used to do a similar thing. The client gave the code to someone else, so I may as well share it here.

"select *,IF(field_name='".$search."',1,0)+IF(field_name like '".$search."%',1,0)+IF(field_name like '%".$search."',1,0)+IF(field_name like '%".$search."%',1,0) as score from table_name where field_name like '%".$search."%' or substring(soundex(field_name),1,4)=substring(soundex('".$search."'),1,4) order by score DESC,field_name"

Putting this into some form of english: -
Select all records from table_name where field_name contains, or sounds like, $search.
If field_name = $search add 1 to 'score'
If field_name ends in $search add 1 to 'score'
If field_name begins with $search add 1 to 'score'
If field_name contains $search add 1 to 'score'
Sort the results in score order

If you searched for 'widget' and the table contained 'widget', 'blue widget', 'widget balls', 'green widget balls', 'widjet', 'turkey'

Then the results would be:-
widget - score=4
blue widget - score=2
widget balls - score=2
green widget balls - score=1
widjet - score=0

I found this worked out quite well. Hope it helps.

louponne

6:22 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, RichD, but actually in my situation, if the user searches for "widjet", I want to return results with "widget" in them!

louponne

7:43 pm on Dec 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



whew, well after poring over all of that, I still have a question. If I do one of these on a table with thousands of records, will it get totally bogged down on the web?