homepage Welcome to WebmasterWorld Guest from 54.205.189.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderators: physics

Databases Forum

    
MySQL generate Google-search type of query in PHP
kajje




msg:4225780
 5:10 am on Nov 3, 2010 (gmt 0)

I have a table fully loaded with information.
I want to perform a (simplified) Google-search formatted query to get my results.


Search queries examples:
Q1) one two three
Q2) "one two" three
Q3) one two -three


Q1) should give a simple query
SELECT * FROM `myTable` WHERE `myField` LIKE '%one%' AND `myField` LIKE '%two%' AND `myField` LIKE '%three%';

Q2) should give
SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` LIKE '%three%'

Q3) should give
SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` NOT LIKE '%three%'


Now I feel mostly strong enough to build this query in php though it would not be completely fool-proof.

Any suggestions on how to make the php script simpler? Regular expressions (and if so, can you give me some examples)? How to get every string between 2 double-quotes in an array?

Any other suggestions on how to make more nice Google-query functionality?

Thanks in advance

 

enigma1




msg:4225851
 9:41 am on Nov 3, 2010 (gmt 0)

Here is an example with a regular expression to extract portions of the string containing quotes.
$search = '"one two" three';
$pattern = '/(?>")[^"]*+"/';
preg_match_all($pattern, $search, $match_array);

You could also use
explode('"',$search);
and that will return an array splitting the string among the double quotes and then use a counter to join the segments that appear within the string.

In general you should first filter the input submitted by the user with a pattern that contains characters you expect. That's before breaking down string into segments.

Having all the segments in an array you could feed them into your queries but the search is not too efficient using simply the %segment% patterns.

You see if you search for "one" it will also match words like everyone or someone etc. You need to setup the queries to match expected characters. So if it's to expect the word one to take place somewhere in the mid of a string and you want to search for the exact work you could add a prefix like:
WHERE `myField` LIKE '% one%'
That will try to match something that starts with "one" and is a bit more efficient. Of course it will not work if the sentence begins with one or if there is a different separator like two-one or (one) etc. So the original filtering method is important not only to extract the segments but to replace unwanted characters with separators before feeding it into the queries.

Finally the search process should be repeated if there is a spelling mistake by reducing the number of characters in the segments before trying again up to a number of times. Right to Left character reduction could be applied till you get a match. Limits of characters in each word can improve the search. For instance if someone searches for
one or a three
you could eliminate any word that is less than 3 characters.

Because the queries at the end can be long enough with the various and operators for each segment you could get the db results and then filter them for the excluded words. In other words, the exclusion operator could be processed in php. Of course depends how many results you will present if it's to use pagination for example from the returned rows, you need the NOT inside the query.

rocknbil




msg:4226001
 3:08 pm on Nov 3, 2010 (gmt 0)

This one is probably "close enough"

Q1) one two three
Q1) should give a simple query
SELECT * FROM `myTable` WHERE `myField` LIKE '%one%' AND `myField` LIKE '%two%' AND `myField` LIKE '%three%';


But it doesn't account for whole words, it will, for example, match on "anyone" and "someone". More below. These are mostly incorrect:

Q2) "one two" three
Q2) should give
SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` LIKE '%three%'


Though the liklihood is slim, it would match on "someone two". The % is a wildcard character, when first it means "anything before" and last, "anything after." What you'd need here is some form of mySQL regexp [dev.mysql.com]. Something like

select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` like '%three%'

or better,

select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` regexp '[[:<:]]three[[:>:]]'

[[:<:]] and [[:>:]] being word boundaries. You can test this on the command line:

select 'zero one two three four' regexp '[[:<:]]one two[[:>:]]';
--> 1

select 'zero one OOPS two three four' regexp '[[:<:]]one two[[:>:]]';
-- 0

Same with this one.

Q3) one two -three
Q3) should give
SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` NOT LIKE '%three%'


select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` not regexp '[[:<:]]three[[:>:]]'

There's no sense in extracting too many results then slogging through them in your programming, this makes for excessive use of resources and slow queries. You can do the regexps directly in the selects, see link above for more options. Though they are often more difficult to compose, regexps are more efficient than 'like' anyway.

Demaestro




msg:4226027
 3:59 pm on Nov 3, 2010 (gmt 0)

Once you get your query working you will have to work on a sorting algo.

The beauty of Google was always the "relavance" factor. It isn't too difficult to script a light weight one yourself.

Basically you want to "bubble" up the most relevant results to the top.

You can do that by

Q1) one two three:

anything that starts with "one two three" moves all the way up

anything that starts with "one two" moves up

anything that starts with "one" moves up

anything that contains one & two & three moves up

anything that contains one & two moves up

and so on......

topr8




msg:4226092
 6:07 pm on Nov 3, 2010 (gmt 0)

IMO you need to rethink the query completely.

you need to create a fulltext index on 'myfield'

the syntax for searching is then slightly different
see the documentation:
[dev.mysql.com...]

enigma1




msg:4226498
 10:12 am on Nov 4, 2010 (gmt 0)

regexp in general is slower than like. Even if it looks you setup multiple like clauses vs a single regexp.

regexp is good if you need detailed control of a pattern, but for a general word/phrase search, like is better.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved