enigma1

msg:4225851 | 9:41 am on Nov 3, 2010 (gmt 0) |
Here is an example with a regular expression to extract portions of the string containing quotes. $search = '"one two" three'; $pattern = '/(?>")[^"]*+"/'; preg_match_all($pattern, $search, $match_array); You could also use explode('"',$search); and that will return an array splitting the string among the double quotes and then use a counter to join the segments that appear within the string. In general you should first filter the input submitted by the user with a pattern that contains characters you expect. That's before breaking down string into segments. Having all the segments in an array you could feed them into your queries but the search is not too efficient using simply the %segment% patterns. You see if you search for "one" it will also match words like everyone or someone etc. You need to setup the queries to match expected characters. So if it's to expect the word one to take place somewhere in the mid of a string and you want to search for the exact work you could add a prefix like: WHERE `myField` LIKE '% one%' That will try to match something that starts with "one" and is a bit more efficient. Of course it will not work if the sentence begins with one or if there is a different separator like two-one or (one) etc. So the original filtering method is important not only to extract the segments but to replace unwanted characters with separators before feeding it into the queries. Finally the search process should be repeated if there is a spelling mistake by reducing the number of characters in the segments before trying again up to a number of times. Right to Left character reduction could be applied till you get a match. Limits of characters in each word can improve the search. For instance if someone searches for one or a three you could eliminate any word that is less than 3 characters. Because the queries at the end can be long enough with the various and operators for each segment you could get the db results and then filter them for the excluded words. In other words, the exclusion operator could be processed in php. Of course depends how many results you will present if it's to use pagination for example from the returned rows, you need the NOT inside the query.
|
rocknbil

msg:4226001 | 3:08 pm on Nov 3, 2010 (gmt 0) |
This one is probably "close enough" Q1) one two three Q1) should give a simple query SELECT * FROM `myTable` WHERE `myField` LIKE '%one%' AND `myField` LIKE '%two%' AND `myField` LIKE '%three%'; |
| But it doesn't account for whole words, it will, for example, match on "anyone" and "someone". More below. These are mostly incorrect: Q2) "one two" three Q2) should give SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` LIKE '%three%' |
| Though the liklihood is slim, it would match on "someone two". The % is a wildcard character, when first it means "anything before" and last, "anything after." What you'd need here is some form of mySQL regexp [dev.mysql.com]. Something like select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` like '%three%' or better, select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` regexp '[[:<:]]three[[:>:]]' [[:<:]] and [[:>:]] being word boundaries. You can test this on the command line: select 'zero one two three four' regexp '[[:<:]]one two[[:>:]]'; --> 1 select 'zero one OOPS two three four' regexp '[[:<:]]one two[[:>:]]'; -- 0 Same with this one. Q3) one two -three Q3) should give SELECT * FROM `myTable` WHERE `myField` LIKE '%one two%' AND `myField` NOT LIKE '%three%' |
| select * from `myTable` where `myField` regexp '[[:<:]]one two[[:>:]]' and `myField` not regexp '[[:<:]]three[[:>:]]' There's no sense in extracting too many results then slogging through them in your programming, this makes for excessive use of resources and slow queries. You can do the regexps directly in the selects, see link above for more options. Though they are often more difficult to compose, regexps are more efficient than 'like' anyway.
|
Demaestro

msg:4226027 | 3:59 pm on Nov 3, 2010 (gmt 0) |
Once you get your query working you will have to work on a sorting algo. The beauty of Google was always the "relavance" factor. It isn't too difficult to script a light weight one yourself. Basically you want to "bubble" up the most relevant results to the top. You can do that by Q1) one two three: anything that starts with "one two three" moves all the way up anything that starts with "one two" moves up anything that starts with "one" moves up anything that contains one & two & three moves up anything that contains one & two moves up and so on......
|
topr8

msg:4226092 | 6:07 pm on Nov 3, 2010 (gmt 0) |
IMO you need to rethink the query completely. you need to create a fulltext index on 'myfield' the syntax for searching is then slightly different see the documentation: [dev.mysql.com...]
|
enigma1

msg:4226498 | 10:12 am on Nov 4, 2010 (gmt 0) |
regexp in general is slower than like. Even if it looks you setup multiple like clauses vs a single regexp. regexp is good if you need detailed control of a pattern, but for a general word/phrase search, like is better.
|
|