Forum Moderators: open

Message Too Old, No Replies

ordering results in mysql. by what?

to make my search results more palatable

         

httpwebwitch

3:28 pm on Jan 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi everyone

background:
I've got a mysql table with columns "firstname" and "lastname".

I have a function whereby users can search for a name. For example, I'll search for the name "Ian"

Right now, I'm using a LIKE clause:

SELECT * FROM people WHERE firstname LIKE '%ian%' OR lastname LIKE '%ian%'

And I like to do a little paging, so the query for page one is:

SELECT * FROM people WHERE firstname LIKE '%ian%' OR lastname LIKE '%ian%' LIMIT 0,10

I get back exactly what I expect. Results are ordered by their sequence in the database, which is to say, no particular order at all.

(results fictionalized)

1) Petra Kocianova
2) Sebastian Truman
3) Ian McGarry
4) Adrianne DeWitt
5) Nicos Iannopolis
6) Harry Chiang

But golly, I was searching for "Ian", not "Petra". Sure I'd like Petra to appear in the results, but not at the top! Intuitively, I think the results should be ordered:

1) Ian McGarry
2) Nicos Iannopolis
3) Calvin Chiang
4) Adrianne Chapman
5) Petra Kocianova
6) Sebastian Truman

See, I put Ian at the top. Nicos is #2 because his last name starts with "Ian". The rest are ordered by the position of the needle string in the haystack string.

Since I'm using a LIMIT clause for pagination, I'd prefer to handle the ordering in mysql, i.e. I'd rather not pull the entire result set into a PHP array and sort it there.

What should I do?

httpwebwitch

3:44 pm on Jan 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hey

I think I've got it

SELECT 
firstname,
lastname,
instr(firstname,'ian') as fpos,
instr(lastname,'ian') as lpos,
(select (fpos + lpos) as score) as score
FROM people
WHERE firstname LIKE '%ian%'
OR lastname LIKE '%ian%'
order by score
LIMIT 0,10

coopster

4:00 pm on Jan 16, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



That looks like a good approach. I was going to recommend something like POSITION.

Demaestro

5:19 pm on Jan 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nice... I was just talking about this with someone else but we were using levenstein distance.

Instead of:

WHERE firstname LIKE '%searchterm%'
OR lastname LIKE '%searchterm%

WHERE levenstein(lower(first_name), 'searchterm') <= 2
Or levenstein(lower(lasst_name), 'searchterm') <= 2

However we too were pulling it into a script to sort it logically. We also determine what integer to check for based on len(searchterm) if it is two letter search term then checking for a levenstein distance of 2 is silly.

Thanks I am off to play with your sorting solution.

*edit (less then, greater then,)

[edited by: Demaestro at 5:20 pm (utc) on Jan. 16, 2009]

rocknbil

7:40 pm on Jan 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would put this in the hands of whoever's searching.

// "starts with" means no wild card before term. 1 = % before, 2 = % after, 3 both, nothing = exact
$searchform .= '<p>
<input type="radio" name="wild" value="3"';
if ($qs['wild'] == 3) { $searchform .= ' checked'; }
$searchform .= '> Anywhere <input type="radio" name="wild" id="wild_2" value="2"';
if ($qs['wild'] == 2) { $searchform .= ' checked'; }
$searchform .= '> Starts with <input type="radio" name="wild" id="wild_1" value="1"';
if ($qs['wild'] == '1') { $searchform .= ' checked'; }
$searchform .= '> Ends with <input type="radio" name="wild" id="wild_none" value=""'
if ((! isset($qs['wild'])) or ($qs['wild'] == '')) { $searchform .= ' checked'; };
$searchform .= '> Exact</p>';

$searchform .= '<p>Order by:
<input type="radio" name="order_by" id="order_by_last" value="1"';
if ((! isset($qs['order_by'])) or ($qs['order_by'] == 1)) { $searchform .= ' checked'; }
$searchform .= '> Last Name <input type="radio" name="order_by" id="order_by_first" value="2"';
if ($qs['order_by'] == 2) { $searchform .= ' checked'; }
$searchform .= '> First</p>';

then,


$pre=$post=$comparator='';
if (isset($qs['wild']) and ($qs['wild'] != '')) {
$comparator = ' like ';
if ($qs['wild'] == 1) { $post = '%'; }
if ($qs['wild'== 2) { $pre = '%'; }
if ($qs['wild'] == 3) { $pre = '%'; $post = '%'; }
}
else { $comparator = '='; }
$where = $comparator . "'" . $pre . $search_term . $post . "'";

$order_by = (isset($qs['order_by']))?$qs['order_by']:'lastname';

$select = "select * from people where firstname $where or lastname $where order by $order_by limit $start_record, $limit";

You could apply the same concepts giving last name a preference over first, etc . . .

httpwebwitch

1:23 pm on Jan 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thanks rocknbil, that's a real nugget.

I won't use it on this project though. I don't want the user to choose "begins with" or "ends with" for their query, though I know some directory-type sites that offer such fine-grained options. I'd rather if the algo just shows whatever *I* think is the most intuitive match.

The query so far is *pretty good* but it's not quite *there* yet. Here's an example - search for "Pat"

results:
1) Patty McManus
2) Patrick King
3) Pat Lebarbar
4) Pat Smith
5) Ho Pat
6) Kim Patel
7) Samuel Patterson

In my mind, "Pat" is a closer match to "pat" than "Patty". So while match position (as above) does produce better results, a Levenshtein distance would be more satisfying. See below, that "Ho Pat" with the exact last name match is ranked lower than "Pat Smith" with an exact firstname match

1) Pat Lebarbar
2) Pat Smith
3) Ho Pat
4) Patty McManus
5) Patrick King
6) Kim Patel
7) Samuel Patterson

Is there a method for getting a ranking similar to this with MySQL, without using Levenshtein?

There are apparently ways to compile custom C functions into MySQL with CREATE FUNCTION, but that's way beyond my comfort zone

enigma1

3:20 pm on Jan 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you would always run into cases like this, well you could expand the query adding another set of scores for firstname/lastname for 'pat ' or use the mysql "left" operator.

[dev.mysql.com...]

You could setup a different way for the score giving perhaps more weight to the trailing space and order by score desc so the score left generates comes up with a higher value for the firstname and then for the lastname. You could adjust the score weight on the firstname in other words to be higher than lastname.