Forum Moderators: coopster
I'm having some difficult with two pieces of code I'm writing and was hoping someone out there could help me out. Any comments or direction would be highly appreciated.
These snippets apply to a database of properties. and my first dilemna is with preg_match_all. I've always had trouble understanding it:
if (isset($_REQUEST["find"])) {
preg_match_all("/[A-Z0-9\']{1,16}/i", $_REQUEST["find"], $words);
$find = implode(" ", $words[0]); //recreate var for FULLTEXT MySQL search later
}
Worked great for a search like breakers west but if you type in breaker's west I get 3 words in my array - breaker, 's, and west. I can't seem to come up with the best way to create an array stripped of all the non alphanumeric chars.
The second one is a little tougher. I'm stuck using MySQL 3.23 and I am doing a FULLTEXT query against name and description fields. I wasn't completely happy with the relevancy of SQL results so I decided to augment the MySQL results with my own relevancy. Problem is that small common words like "a" and "the" are toasting my relevancy results. I was wondering if the best approach would be to test the array against a stopword array loaded with common English words or if you should just limit relevancy check to like words with 4 chars or greater? Or my code might just be on the wrong track completely. Snippet is as follows:
$sql = "SELECT id, name, description, MATCH(name, description) AGAINST ('".addslashes($find)."')
AS score FROM `property`
WHERE MATCH(name, description) AGAINST ('".addslashes($find)."')
AND active='1' ORDER BY score DESC"; //relevancy search against property database
$NumberOfMatches = mysql_num_rows(mysql_query("$sql", $linkID));
if ($NumberOfMatches > 0) //Check for >0 matches else print none found
{
$sqlresult = mysql_query("$sql", $linkID);
for ($x = 0; $x < $NumberOfMatches; $x++) //not using a while loop so I can have a $x for array
{
list ($id, $name, $description) = mysql_fetch_row($sqlresult); //I could have retrieved number of ways but stripslashing the results anyway
$results[$x][id] = $id;
$results[$x][name] = stripslashes($name);
$results[$x][description] = stripslashes($description);
$results[$x][relevancy] = 0;
foreach ($words[0] as $check) {
$results[$x][name] = str_replace(strtoupper($check), "<b>".strtoupper($check)."</b>", $results[$x][name]); //Capitalize and make bold the word match in array
$offset = 0; $position = TRUE;
while ($position!== FALSE) //Step through name string looking for match to $words[0]
{
$position = strpos(strtolower($results[$x][name]), strtolower($check), $offset);
if ($position === FALSE) {break;}
$results[$x][relevancy] += 1.4;
$offset = $position + 1;
}
$results[$x][description] = str_replace(strtoupper($check), "<b>".strtoupper($check)."</b>", $results[$x][description]);
$offset = 0; $position = TRUE;
while ($position!== FALSE) //Step through description string looking for match to $words[0]
{
$position = strpos(strtolower($results[$x][description]), strtolower($check), $offset);
if ($position === FALSE) {break;}
$results[$x][relevancy] += .3;
$offset = $position + 1;
}
}
}
usort($results, compare_relevancy);
}
else {echo "<p>Sorry, I didn't find any properties matching: <b>$find</b></p>";}
I do wonder if I should even be loading an array this way from MySQL to accomplish this relevancy task. I could foresee memory issues happening if the database got large on down the road. Any help, comments, direction or criticisms on either of my dilemnas would be HIGHLY appreciated,
Thanks,
Burner
if (isset($_REQUEST["find"])) {
print $_REQUEST["find"]; exit;
preg_match_all("/[A-Z0-9\']{1,16}/i", $_REQUEST["find"], $words);
$find = implode(" ", $words[0]); //recreate var for FULLTEXT MySQL search later
}It probably looks something like this: breaker\'s west
... and your pattern isn't going to match it so it breaks that into it's own word in your array.
It seems you are doing a lot of extra work here that MySQL should do for you by default. I understand you said that your results weren't quite what you desired so you are trying alternatives, but everything you are doing here seemingly is the same as what MySQL's FULL TEXT [dev.mysql.com] search was made for.
I really don't have any solid answer for your second question but was more curious as to what you didn't like about the result set that came back and if there was something more that you could do there first?
The first issue probably has more to do with the data coming from a POST form variable. It's likely that magic_quotes are on and you have a slash (\) in front of that single quotation mark there. Dump that variable to your browser to see.if (isset($_REQUEST["find"])) {
print $_REQUEST["find"]; exit;
preg_match_all("/[A-Z0-9\']{1,16}/i", $_REQUEST["find"], $words);
$find = implode(" ", $words[0]); //recreate var for FULLTEXT MySQL search later
}It probably looks something like this:
breaker\'s west
... and your pattern isn't going to match it so it breaks that into it's own word in your array.
As far as why I'm going through the extra work. It is a FULLTEXT search, but I'm stuck on MySQL 3.23 in this project so I couldn't use "in boolean". When my results were coming back to me, often what I was looking wouldn't be ranked well enough for me. For example, I would look for a specific property name called maybe something like Hull House. It would be listed behind several other properties on Hull Street in the town of Hull. (Fictitious Names). I decided the way to fix it was to augment my result set by putting the user entered query string into an array and weighing how many times the keywords appeared in the result set from my FULLTEXT query and assigning my own values higher/lower depending on which field the keywords appeared in.
Actually, about 24 hours after I posted this I fixed my issue by a combination of a stopword array and test for words under 4 chars and I'm very happy know with the relevancy of my search. Apparently I'm still a noob though 'cause I got hung up on the slash thing and kept blaming preg_match_all because I barely understand the pattern matching.
Thank you so much for your help,
Burner