Forum Moderators: coopster
I have been thinking about stripping out certain words, just to keep subject type of words in, but havnt been able to get close.
Any help would be greatly appreciated, thanks!
Essentially, a count of the number of edits needed to change each database paragraph into the target paragraph. The paragraphs with the lowest counts are "most similar".
You may need to calculate using several different metrics and combine the scores in a weighted way that meets your application's quirks.
It's what I do, and it works very well.
Wikiedia for Jaccard Index to get yourself started.
I ma not aware of any canned code in PHP. To find the code I use, search with google for
rebol simetrics
You may be able to recast the algorithms into PHP given the above reference implementation.
[sourceforge.net...]
[dev.mysql.com...]
It will match even if some words are missing / rearranged. Result set can be sorted by relevance. Works much faster than LIKE. But uses more disk space than regular indices.