Forum Moderators: coopster
i have a web application that needs to scrap external data to help the analysis.
The format of external data is like this:
<tr class="1">
<td>data1</td>
<td>data2</td>
<td>data3</td>
</TR>
<tr class="0">
<td>data1</td>
<td>data2</td>
<td>data3</td>
</TR>
<tr class="1">
<td>data1</td>
<td>data2</td>
<td>data3</td>
</TR>
I would like to scrap all data available. But when i use preg_match, the script only scraps the first or the last line, instead of all lines that i need.
do you know how i can scrap all of them? Thanks a lot!
P.S. the external data source is based on another bigger database, and authors authorize anyone to manipulate their work.
the script only scraps the first or the last line, instead of all lines that i need.
Without seeing code, we can only guess at the possible causes:
- you are using a multiline read approach, reading the source code line by line, hence, only the first or more likely last line is all that is getting stored.
- You are storing said lines in a scalar instead of an array
- your regexp is not supporting global or multiline matching
You should do something **like** this:
$content = [get content of entire page];
$lines = list (explode ("\n",$content));
foreach ($lines as $line) {
// do your preg match, replace, whatever
if (preg_match('/pattern/ig',$line)) {
$out .= preg_replace('/<\/*([^>])+>/g',$line); // all html
}
}