Forum Moderators: coopster

Message Too Old, No Replies

Help With Screen Scraping

         

McBlack

8:26 pm on Feb 4, 2008 (gmt 0)

10+ Year Member



I have managed to make a screen scraper that finds everything on the page. Now I want to know how I can chose where to make it start and where to make it end.

Here's my code so far:

<?php
$context=array('http' => array ('header'=> 'Range: bytes=1024-', ),);
$xcontext = stream_context_create($context);
$test=file_get_contents("http://www.example.com",FALSE,$xcontext);
print <<<EOF
$test
EOF;
?>

eelixduppy

9:54 pm on Feb 4, 2008 (gmt 0)



You can use regular expressions [us.php.net] or strpos [php.net] to find different parts of the retrieved content and echo only certain parts. It really depends on what you are trying to do.

GamingLoft

9:57 pm on Feb 4, 2008 (gmt 0)

10+ Year Member



[webmasterworld.com...]

// thats a thread that i created, i think i did the exact same thing as you, just look at my source code and see if you can make sense of it.

home it helps, good luck!

whoisgregg

9:58 pm on Feb 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This road typically leads into the dark and haunted regular expression [php.net] forest.

To get you started:

<?php 
$context=array('http' => array ('header'=> 'Range: bytes=1024-', ),);
$xcontext = stream_context_create($context);
$test=file_get_contents("http://www.example.com",FALSE,$xcontext);
$pattern = '/\<a\ href\=\"([^"]+)\"\>([^<]+)\<\/a\>/i';
echo preg_match_all($pattern, $test, $matches);
print_r($matches);
?>

Although, in my research I came across some interesting stuff about the PHP DOM functions [php.net] and how those can be used for getting data out of an HTML string.