Forum Moderators: coopster

Message Too Old, No Replies

snagging and parsing html with regexp in php

         

broniusm

5:00 pm on Nov 4, 2003 (gmt 0)

10+ Year Member



I am working on a script to grab some remote html, parse the TDs of a table into array elements from which I could then use the data on my own. I am having both a regexp and a php regexp syntax issue, in that I don't know which is wrong when and how:

// pick out individual events & store them in an array
$matches = preg_match_all("<td[^>]*>(.*)</td>", $alltext, $arrdata);

// raw output the TDs
print "<textarea>";
for ($i=0; $i<count($arrdata[0]); $i++) {
print "\nMatch $i:\n".$arrdata[0][$i];
}
print "</textarea>";

At this stage, I just want to grab all TDs and display them one by one. Eventually, I will parse the semi-structured data within the TD to make better sense of it.

Can someone please help?

coopster

5:26 pm on Nov 4, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



The expression should be enclosed in the delimiters, a forward slash (/), for example:

$matches = preg_match_all("/<td.*>(.*)<\/td>/Ui", $alltext, $arrdata);

broniusm

7:33 pm on Nov 4, 2003 (gmt 0)

10+ Year Member



thanks coopster-- right on!
now I seem to have a problem with either whitespace or linebreaks.. the raw html looks like this, so you can see it's a bit spacey:

<tr>
<td colspan=2 class=searchres>


<b><a href="[..url..]">[..event title..]</a></b>

<br>
Monday, 11/3/2003 at 4:00pm<br>
Meetings & Conventions<br>
[Event Category]
</td>
</tr>

I think I need to strip all that for the regexp to pick up what I'm requesting.

coopster

10:09 pm on Nov 4, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I'm not sure if I understand your request, but if it isn't working because of newlines, add the
s (PCRE_DOTALL)
modifier. If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded:

$matches = preg_match_all("/<td.*>(.*)<\/td>/Uis", $alltext, $arrdata);

broniusm

4:44 am on Nov 5, 2003 (gmt 0)

10+ Year Member



coopster- ya did it again! :)

Thanks very much for your expertise. It's a bit frustrating to know you're on the right track but not to be 100% about the syntax-- you helped me through the muck.