| 9:41 am on Nov 3, 2010 (gmt 0)|
Here is an example with a regular expression to extract portions of the string containing quotes.
$search = '"one two" three';
$pattern = '/(?>")[^"]*+"/';
preg_match_all($pattern, $search, $match_array);
You could also use
and that will return an array splitting the string among the double quotes and then use a counter to join the segments that appear within the string.
In general you should first filter the input submitted by the user with a pattern that contains characters you expect. That's before breaking down string into segments.
Having all the segments in an array you could feed them into your queries but the search is not too efficient using simply the %segment% patterns.
You see if you search for "one" it will also match words like everyone or someone etc. You need to setup the queries to match expected characters. So if it's to expect the word one to take place somewhere in the mid of a string and you want to search for the exact work you could add a prefix like:
WHERE `myField` LIKE '% one%'
That will try to match something that starts with "one" and is a bit more efficient. Of course it will not work if the sentence begins with one or if there is a different separator like two-one or (one) etc. So the original filtering method is important not only to extract the segments but to replace unwanted characters with separators before feeding it into the queries.
Finally the search process should be repeated if there is a spelling mistake by reducing the number of characters in the segments before trying again up to a number of times. Right to Left character reduction could be applied till you get a match. Limits of characters in each word can improve the search. For instance if someone searches for
one or a three
you could eliminate any word that is less than 3 characters.
Because the queries at the end can be long enough with the various and operators for each segment you could get the db results and then filter them for the excluded words. In other words, the exclusion operator could be processed in php. Of course depends how many results you will present if it's to use pagination for example from the returned rows, you need the NOT inside the query.
| 3:08 pm on Nov 3, 2010 (gmt 0)|
This one is probably "close enough"
|Q1) one two three |
Q1) should give a simple query
SELECT * FROM `myTable` WHERE `myField` LIKE '%one%' AND `myField` LIKE '%two%' AND `myField` LIKE '%three%';
But it doesn't account for whole words, it will, for example, match on "anyone" and "someone". More below. These are mostly incorrect:
|Q2) "one two" three |
Q2) should give
SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` LIKE '%three%'
Though the liklihood is slim, it would match on "someone two". The % is a wildcard character, when first it means "anything before" and last, "anything after." What you'd need here is some form of mySQL regexp [dev.mysql.com]. Something like
select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` like '%three%'
select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` regexp '[[:<:]]three[[:>:]]'
[[:<:]] and [[:>:]] being word boundaries. You can test this on the command line:
select 'zero one two three four' regexp '[[:<:]]one two[[:>:]]';
select 'zero one OOPS two three four' regexp '[[:<:]]one two[[:>:]]';
Same with this one.
|Q3) one two -three |
Q3) should give
SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` NOT LIKE '%three%'
select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` not regexp '[[:<:]]three[[:>:]]'
There's no sense in extracting too many results then slogging through them in your programming, this makes for excessive use of resources and slow queries. You can do the regexps directly in the selects, see link above for more options. Though they are often more difficult to compose, regexps are more efficient than 'like' anyway.
| 3:59 pm on Nov 3, 2010 (gmt 0)|
Once you get your query working you will have to work on a sorting algo.
The beauty of Google was always the "relavance" factor. It isn't too difficult to script a light weight one yourself.
Basically you want to "bubble" up the most relevant results to the top.
You can do that by
Q1) one two three:
anything that starts with "one two three" moves all the way up
anything that starts with "one two" moves up
anything that starts with "one" moves up
anything that contains one & two & three moves up
anything that contains one & two moves up
and so on......
| 6:07 pm on Nov 3, 2010 (gmt 0)|
IMO you need to rethink the query completely.
you need to create a fulltext index on 'myfield'
the syntax for searching is then slightly different
see the documentation:
| 10:12 am on Nov 4, 2010 (gmt 0)|
regexp in general is slower than like. Even if it looks you setup multiple like clauses vs a single regexp.
regexp is good if you need detailed control of a pattern, but for a general word/phrase search, like is better.