Forum Moderators: coopster
1. Retreives a list of URL's from a DB
2. Uses CURL to fetch XML from each of the URL's
3. Stores the XML locally
The script works fine and does what it is supposed to, but it does it rather slowly. We have been able to speed it up by running multiple instances of the script, each instance taking different records from the DB. This is rather messy and becomes a nightmare trying to schedule with CRON.
Theoretically is it possible to speed this up by just calling 1 script.
you may want to try adding an ampersand to the end of your command:
$command = "curl -o $someURL &";
in theory, this should run CURL in the background and let your script continue instead of having your script wait for the output of the curl command. If this is part of a loop, your script will run multiple instances of CURL in the background.
Here's the code. I don't really know much about php
<?
ob_Start();
//Database settings
require_once("../includes/db_config.php");
require_once("../cache.php");
$db = mysql_connect(DB_HOST, DB_USER, DB_PASS) or die(mysql_error());
mysql_select_db(DB_NAME) or die(mysql_error());
$query = "SELECT * FROM cms_locations_widgets";
$result = mysql_query($query);
$url = "http://someurl/xml.php?PID1=[PID1]&PID2=[PID2]&PID3=[PID3]";
$cache = new URL_Cache();
$logfile = SERVER_PATH."xml/log.txt";
if(file_exists($logfile)) unlink ($logfile);
$log = fopen($logfile,"a");
$filepath = $_SERVER["DOCUMENT_ROOT"]."/xml/";
if($result)
while($row = mysql_fetch_array($result))
{
$tmp = ereg_replace("\[PID1\]",$row["WIDGET1"],$url);
$tmp = ereg_replace("\[PID2\]",$row["WIDGET2"],$tmp);
$tmp = ereg_replace("\[PID3\]",$row["WIDGET3"],$tmp);
$xml = $cache->CurlConnect($tmp);
$file = $filepath.$row["WIDGET1"].$row["WIDGET2"].$row["WIDGET3"].".xml";
$file = ereg_replace(" ","_",$file);
$file = strtolower($file);
$error ="";
if(!eregi("<Exception>", $xml) and!eregi("Error", $xml))
{
if($fp = fopen($file,"w+"))
{
fwrite($fp, $xml);
fclose($fp);
chmod($file,0777);
//if(file_exists($file)) unlink($file);
//$error = "Success - ".$row["WIDGET1"].$row["WIDGET2"].$row["WIDGET3"].".xml\n\n";
$count++;
}
} else {
$error = "---------------------------------------------------------------------------------------\n\n";
$error .= $row["WIDGET1"].", ".$row["WIDGET2"].$row["WIDGET3"]."\n";
$error .= "$tmp\n\n";
$error .= "Error: \n $xml\n\n";
$error .= "---------------------------------------------------------------------------------------\n\n";
}
fwrite($log, $error);
echo "<br>".$error."<br>";
}
echo "Retrived - ".$count;
fclose($log);
ob_flush();
?>
function CurlConnect( $requestURL )
{
// create curl handle (this will be used for all curl functs
if(eregi("xml=",$requestURL))
{
$url = split("xml=",$requestURL);
$url[1] = urlencode($url[1]);
$url = implode("xml=",$url);
} else {
$url = $requestURL;
$url = ereg_replace(" ","%20",$url);
}
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //return results inline
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$response = curl_exec($ch);
if(curl_errno($ch))
{
$response = "<Exception>".curl_errno($ch)." - ". curl_error($ch)."</Exception>";
}
else
curl_close($ch);
return $response;
}
When your script downloads the xml, the script does nothing until the download is complete. Once the download is finished, the script begins the next download. If running multiple instances of the script speeds up the process, then you should look into either making multiple scripts, or reconfigure your script in such a way that multiple files are downloaded at the same time.
[php.net...]
another thought is controlling the whole process with php but maybe using shell scripts ofr the grunt work
$result = mysql_query("select * from DATABASE");
//change to the download directory
chdir("/Download/directory/path");
while($row = mysql_fetch_row($result) )
{
//create a URL and filename based on $row data
$myURL = URL from $row data;
$myFile = name based on $row data.".xml";
$command = "curl $myURL -o $myFile &";
shell_exec($command);
}