Forum Moderators: open
I have been asked to finish a half started website, and stumbled up the fact none of the data in his database is escaped.
As I prefer to work with all the data in a safer form, I'd like to replace all instances in the db with the same "escaped" data.
eg if 150 records have apostrophes in the data in various fields across the db, how can I easily and quickly escape it all?
Many thanks.
The main reason I want to do this is because he has a search facility on the website, and there are many mispelt keywords that people search for. The best example I can think of is that people search for the terms "djs", "deejays" and "dj's" and I am wanting to change the search facility to return all instances.
Am I best removing all apostrophes from both values in the db, as well as stripping them from keywords from the search before running the query? My only drawback with this is that the information displayed to the user will not be grammatically correct?
Or is there a way I can write a piece of code that will search for replacement keywords and substitute them before running the query?
Any other suggestions?
Check to see if your DB has a soundex library or something similar. If so there should be a function called Levenstein.
What is does is it returns a count of differing characters.... it calls it the Levenstein distance.
Example.. a city named "Fort MacMurray"
Is someone types it in as "Ft MacMurray" that would give me a Levenstein distance of 2. Because there is a 2 character difference.
If they did "Fort McMurray" it would be a Levenstein distance of 1.
You can ask your db to return things where the distance is less then 3 or 4 or however many you want.
So a query would look like this....
$user_input = 'ft macmurrary'
id ¦ fieldname
1 ¦ Fort MacMurray
2 ¦ SomeOther City
3 ¦ aThird City
Select
*
from
table
where
Levenstein(lower($user_string), lower(table.fieldname)) < 4
This would find the record with an id of 1. You can further this by doing replaces before passing it to the SQL
[edited by: Demaestro at 4:34 pm (utc) on Oct. 29, 2007]