Forum Moderators: coopster

Message Too Old, No Replies

Search snippet generator

         

Adrian2k4

2:30 am on Oct 4, 2004 (gmt 0)

10+ Year Member



I'm coding a script that searches a MySQL DB.
On the results page I would like to include page snippets with the highlighted search query.
I intended to do the snippets similar to Google.
Any ideas or code on how to code this "snippet-generator"?
Thanks for your help & best regards
Adrian

mincklerstraat

9:46 am on Oct 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you have all your search terms in an array $searchterms:
foreach($searchterms as $v){
$html = str_replace($v, '<span class="searchterm">'.$v.'</span>', $html);
}
in your css:
.searchterm{
background: #f88;
}

Later, if things get hairy and you have some of your search terms inside of tags, you may want to use a fancier replacement with preg_replace. However, it's usually better to start out simple and learn as you go.

Code not tested; try before you buy.

Adrian2k4

10:31 am on Oct 4, 2004 (gmt 0)

10+ Year Member



right - your code does the highlighing. the thing with the search results in the tags can be fixed with strip_tags(). but i can't show a result snippet on the result page in the full length....
how do i shorten the snippet with as many highlighted search terms still visable?

mincklerstraat

3:55 pm on Oct 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want to show the snippet with the maximum number of terms, and you're talking multiple search terms, this would be getting really hairy and most probably using the regex functions, and likely to be a processor hog unless you cache it or the code is truly elegant. I'd stick to simple / stuff that's easily understandable before blossoming into coding spezzatura. That said, you can always hope that some php virtuosos on this board are more helpful than I am and actually oblige you with a juicy, pre-coded nugget that you can cut and paste. If so, consider yourself wildly fortunate and smother them with thanks.

Adrian2k4

11:41 pm on Oct 4, 2004 (gmt 0)

10+ Year Member



yes i am aware that this script will be longer than 10 lines.....
i was just asking if anybody had any ideas on how I could code it since this is quite complicated.
from the mathematical aproach what i need to do is:
- save all the positions of all search terms within the page in an array.
- if the snippet length is x i need to find out which substring of the length x of the page includes the most of the saved positions.

this is possible by doing something like a bruteforce - but that wouldn't be elegant - and would need me to get a faster server....

OR

has anybody got an other aproach of generating snippets? (this is not highlighting but generating aprox. 3 to 4 lines of a preview of the whole page with highlighted search terms - I know how to do the highlighting but I don't know how to get the text form the whole page down to a couple of lines)

mincklerstraat

10:41 am on Oct 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



- save all the positions of all search terms within the page in an array.

This would definitely be the way to go. Although creating the array would be less than straightforward. You could use preg_match_all, and catch $([.]*)searchterm - the entirety of the beginning fo the string before the search term.
- once you have this array, you loop through it to build a second arrary corresponding to this one, and containing the number of other positions which are not more than + or - x (the snippet length minus (the search term length +2 /* 2 = a space on each side of search term */)).
- sort the array according to this value, to help you pick the search positition you'll start from.
- now that you know which search position and search term that's providing you with your starting point, you need to determine which snippet - how many chars to the left, how many chars to the right, you want.
- take that array of search positions, and for each one that's within your x value before the snippet, determine how many other search terms you'd include if you'd start with that snippet.
- take the one that produces the highest number, and if the number of characters between this and the last search term it includes is greater than your x value, include a few words or so to the left of this search string in your snippet for a bit more context.

Notes on this algo: It's not entirely accurate - won't always maximize. In the step where you find how many other search terms are x away from each search term, this of course will include search terms x before your search term, and x after your search term, when in reality if your search term were precisely in the middle, it would be more like x/2. However, it won't be a bad 'guesstimate'. You could refine this step; but it'd make your code more complicated and slower. Or you could just tweak with the x value at this point and use 3x/4 or x/2.