Forum Moderators: coopster

Message Too Old, No Replies

scrape data from external source

         

dulldull

10:28 am on Aug 29, 2009 (gmt 0)

10+ Year Member



Hi,

i have a web application that needs to scrap external data to help the analysis.

The format of external data is like this:

<tr class="1">
<td>data1</td>
<td>data2</td>
<td>data3</td>
</TR>

<tr class="0">
<td>data1</td>
<td>data2</td>
<td>data3</td>
</TR>

<tr class="1">
<td>data1</td>
<td>data2</td>
<td>data3</td>
</TR>

I would like to scrap all data available. But when i use preg_match, the script only scraps the first or the last line, instead of all lines that i need.

do you know how i can scrap all of them? Thanks a lot!

P.S. the external data source is based on another bigger database, and authors authorize anyone to manipulate their work.

rocknbil

5:39 pm on Aug 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



the script only scraps the first or the last line, instead of all lines that i need.

Without seeing code, we can only guess at the possible causes:

- you are using a multiline read approach, reading the source code line by line, hence, only the first or more likely last line is all that is getting stored.

- You are storing said lines in a scalar instead of an array

- your regexp is not supporting global or multiline matching

You should do something **like** this:

$content = [get content of entire page];
$lines = list (explode ("\n",$content));
foreach ($lines as $line) {
// do your preg match, replace, whatever
if (preg_match('/pattern/ig',$line)) {
$out .= preg_replace('/<\/*([^>])+>/g',$line); // all html
}
}

dulldull

3:25 am on Aug 30, 2009 (gmt 0)

10+ Year Member



thanks for your helpfulness. it's very useful!

sastro

3:30 am on Aug 31, 2009 (gmt 0)

10+ Year Member



sor looping data, use preg_match_all();