Forum Moderators: coopster

Message Too Old, No Replies

need help with php to scrape a page

legitimate reasons

         

The_Hat

8:19 pm on Apr 7, 2008 (gmt 0)

10+ Year Member



We are getting set up with a fully hosted cms solution soon but will still have pages that will reside outside of the cms framework.. So we need some way to scrape and duplicate a page wrapper to a secondary location.. What I have so far is

<?php if ($fp = fopen('http://www.example.com/wrapper.html', 'r')) {
$content = '';
// keep reading until there's nothing left
while ($line = fgets($fp, 1024)) {
$content .= $line;
}

$myFile = "testFile.html";
$fh = fopen($myFile, 'w') or die("can't open file");
fwrite($fh, $content);
fclose($fh);

// do something with the content here
// ...
} else {
// an error occurred when trying to open the specified url
}
?>

Which does it pretty well except I need it to stop at a word in the middle and read no further.. the word is also higher up in the code, but the word I need it to stop at is in all caps.. the other instances are not.. also how could I strip out other pieces of code such as <head></head> and the like? can anybody help?

PHP_Chimp

9:29 am on Apr 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have a look at regular expressions(preg_match [uk2.php.net], may well help). As you can build something to strip out from <head>--</head> and find your word in capitals and leave everything after that word.

You could also use file_get_contents [uk2.php.net] to read your url into a string. As if fopen wrappers have been enabled they will work for both fopen and file_get_contents.