Forum Moderators: coopster
<?php
//Don't use this illegally - for PHP 4.3 or more
//scraper.php
$url = 'http://www.webmasterworld.com/forum88';
$findstuff = array();
$findstuff['start'][] = 'Moderated by\: \<a href\="/vewprofile\.cgi\?action=view&member\=';
$findstuff['end'][] = '"';
$findstuff['start'][] = '<img src="http\://showcase\.netins\.net/web/phdss/WebmasterWorldgfx/thread\.png" alt="thread icon" align="left"><font size="2" face="verdana" color="\#000000"><b><a href="/forum88/[0-9]*\.htm" target="_top">';
$findstuff['end'][] = '</a>';
$scrapegreedy = 0;
$user_agent = 'browser';
$preg = 1;
$debug = 0;
$foundbits = go_scrape($url, $findstuff, $scrapegreedy, $user_agent, $preg, $debug);
echo '<pre>';
print_r($foundbits);
echo '</pre>';
//
//
//
function go_scrape($url, $findstuff, $scrapegreedy, $user_agent='', $preg=0, $debug=0){
if(!empty($user_agent)){
if($user_agent = 'browser') ini_set('user_agent', $_SERVER['HTTP_USER_AGENT']);
else ini_set('user_agent', $user_agent);
} else ini_set('user_agent', 'scraper.php - www.webmasterworld.com/forum88/6614.htm');
$contents = file_get_contents($url);
if(!$contents) return 'no contents';
$foundbits = array();
foreach($findstuff['start'] as $k => $v){
if(empty($preg)){
$v = preg_quote($v, '#');
$findstuff['end'][$k] = preg_quote($findstuff['end'][$k]);
}
$pregstring = '#'.$v.'(.*';
if(empty($scrapegreedy)) $pregstring .= '?)';
else $pregstring .= ')';
$pregstring .= $findstuff['end'][$k].'#';
if($debug) echo(htmlspecialchars($pregstring)).'<br />';
$check = preg_match($pregstring, $contents, $matches);
if($check) $foundbits[] = $matches[1];
else $foundbits[] = '* none found *';
}
if($debug) return array($foundbits, $contents);
else return $foundbits;
}
Put your url in $url, and stuff surrounding what you want to get (the beginning and end HTML) in $findstuff['start'] and $findstuff['end'] for each bit of stuff you want to get, like above. You can set user agent if you want, setting it to 'browser' will send the user agent currently being used. Set $preg if you want your strings to be used as preg_match strings (properly escaped, with your special preg stuff inside, but no delimiters) - if they're just 'normal' strings, set this to 0 or leave empty.
If you set $debug to 1, it'll output each regular expression used so you can check up, and return an array including the contents of the page fetched.
The example above outputs the first moderator name found, and the first thread name found in this php forum.