Forum Moderators: coopster

Message Too Old, No Replies

Using STRTOK with multiple character tokens

Possible?

         

trillianjedi

5:01 pm on Jun 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Basically I'm looking to grab all <h2> tagged sub-titles in a string into an array. This is for an auto-index generator.

For example if I have this in a string:-


<h2>Shining your Widget</h2>

Text about shining widgets.

<h2>Don't forget to buy polish!</h2>

Go to the shop, buy some carnuba wax.

<h2>Now sell it on eBay</h2>

Add postage and packing and off you go.

I'd like returned an array containing three elements:-

array[0] = "Shining your Widget"
array[1] = "Don't forget to buy polish!"
array[2] = "Now sell it on eBay"

I keep coming back to STRTOK, but as far as I can tell it will only accept single token delimiters, so I can't do something like:-

$delim = "<h2> </h2>";
$inbetween = strtok($theString, $delim);

Can I make STRTOK work the way that I need, or if not is there a simple way to grab all text between <h2> tags in an array?

Thanks!

trillianjedi

5:18 pm on Jun 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Scrub that - found this fabulous function by "user at no mail dot com" in the comments section of STRPOS in the PHP manual:-

[uk.php.net...]

Scroll down to comments - basically a parser that grabs all values in between two tags into an array.

Simply super :)

trillianjedi

5:39 pm on Jun 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, not quite super - I had to tweak it to do what I wanted. I'm posting here in case anyone else stumbles on this thread. Here's a finished function for grabbing the text between two tags. If the second tag is not handed to the function, it assumes both tags are the same:-


function getStrsBetween($s,$s1,$s2=false,$offset=0) {
if( $s2 === false ) { $s2 = $s1; }
$result = array();
$L1 = strlen($s1);
$L2 = strlen($s2);

$i = 0;

if( $L1==0 ¦¦ $L2==0 ) {
return false;
}

do {
$pos1 = strpos($s,$s1,$offset);

if( $pos1!== false ) {
$pos1 += $L1;

$pos2 = strpos($s,$s2,$pos1);

if( $pos2!== false ) {
$key_len = $pos2 - $pos1;

$this_key = substr($s,$pos1,$key_len);

$result[$i] = $this_key;
$i++;

$offset = $pos2 + $L2;
} else {
$pos1 = false;
}
}
} while($pos1!== false );

return $result;
}

Usage, using my first post as an example - grab all strings between all <h2> tags and dump to an array:-

$myNewArray = getStrsBetween($myString, "<h2>", "</h2>");

eelixduppy

1:12 am on Jun 18, 2007 (gmt 0)



Another way to do it would be to use regular expressions

$pattern = "#(?<=<h2>).+(?=</h2>)#i";
$matches = array();
preg_match_all($pattern,$string,$matches);
echo '<pre>';
print_r($matches);
echo '</pre>';

I messed around with some nifty assertions [uk2.php.net] here ;)

coopster

12:29 pm on Jun 18, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



You shouldn't even need the assertions here, eelix, but yes, I would probably approach this using a regular expression too. I would also use a regex within the function to "build" the closing tag if one was not specified (add the slash after the opening element identifier, the left parenthesis that is) ...
function getStrsBetween($s, $s1, $s2 = false) 
{
if( $s2 === false ) {
$s2 = preg_replace('/</', '</', $s1);
}
$s1 = preg_quote($s1, '/');
$s2 = preg_quote($s2, '/');
preg_match_all("/$s1(.+)$s2/i", $s, $matches);
return $matches[1];
}
print '<pre>';
print_r(array_map('htmlentities', getStrsBetween($string, '<h2>')));
print '</pre>';
exit;

I threw in a couple preg_quotes there too so the pattern doesn't get out of control on us.

trillianjedi

12:40 pm on Jun 18, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks guys. I should have guessed you two would come up with a more elegant way of doing it ;)

One question for you Coop:-

return $matches[1];

I don't understand that part - why the [1]? Is $matches not your usual type of PHP array?

coopster

2:31 pm on Jun 18, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



preg_match_all [php.net]("/$s1(.+)$s2/i", $s, $matches);

The $matches array stores the pattern and any subpatterns in array indexes starting with zero. So ...

Array ( 
[0] Contains an array of the entire pattern in the regex
[1] Contains an array of the 1st set of parenthesized subpatterns
[2] Contains an array of the 2nd set of parenthesized subpatterns
...
)

In this case, you'll note that I added a subpattern-capturing set of parentheses to capture the data between the opening and closing element identifiers.

If you want to see what I mean, dump the array out before returning $matches[1] in the function as follows:

function getStrsBetween($s, $s1, $s2 = false)  
{
if( $s2 === false ) {
$s2 = preg_replace('/</', '</', $s1);
}
$s1 = preg_quote($s1, '/');
$s2 = preg_quote($s2, '/');
preg_match_all("/$s1(.+)$s2/i", $s, $matches);
print '<pre>';
print_r($matches);
print '</pre>';
exit;
return $matches[1];
}

Note, you'll likely have to "View Source" in the browser to see the information as your browser is going to render the HTML in your string when it gets dumped out this way.

eelixduppy

3:15 pm on Jun 18, 2007 (gmt 0)



>> You shouldn't even need the assertions here

Nope; I thought I tried what you have and didn't get the expected results. I was confused at those results, but I kept going until I had something that worked :)

I have to say, though, that I like your pattern better ;)