Forum Moderators: coopster

Message Too Old, No Replies

PHP Script

Need to speed it up.

         

gosman

7:10 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



I have a php script that does the following.

1. Retreives a list of URL's from a DB
2. Uses CURL to fetch XML from each of the URL's
3. Stores the XML locally

The script works fine and does what it is supposed to, but it does it rather slowly. We have been able to speed it up by running multiple instances of the script, each instance taking different records from the DB. This is rather messy and becomes a nightmare trying to schedule with CRON.

Theoretically is it possible to speed this up by just calling 1 script.

jezra

7:28 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



How are you running CURL? if you are doing something like:
$command = "curl -o $someURL";
shell_exec($command);

you may want to try adding an ampersand to the end of your command:
$command = "curl -o $someURL &";

in theory, this should run CURL in the background and let your script continue instead of having your script wait for the output of the curl command. If this is part of a loop, your script will run multiple instances of CURL in the background.

gosman

8:01 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



Hi Jezra.

Here's the code. I don't really know much about php

<?
ob_Start();

//Database settings
require_once("../includes/db_config.php");
require_once("../cache.php");

$db = mysql_connect(DB_HOST, DB_USER, DB_PASS) or die(mysql_error());
mysql_select_db(DB_NAME) or die(mysql_error());

$query = "SELECT * FROM cms_locations_widgets";
$result = mysql_query($query);

$url = "http://someurl/xml.php?PID1=[PID1]&PID2=[PID2]&PID3=[PID3]";

$cache = new URL_Cache();
$logfile = SERVER_PATH."xml/log.txt";

if(file_exists($logfile)) unlink ($logfile);

$log = fopen($logfile,"a");

$filepath = $_SERVER["DOCUMENT_ROOT"]."/xml/";

if($result)
while($row = mysql_fetch_array($result))
{

$tmp = ereg_replace("\[PID1\]",$row["WIDGET1"],$url);
$tmp = ereg_replace("\[PID2\]",$row["WIDGET2"],$tmp);
$tmp = ereg_replace("\[PID3\]",$row["WIDGET3"],$tmp);

$xml = $cache->CurlConnect($tmp);

$file = $filepath.$row["WIDGET1"].$row["WIDGET2"].$row["WIDGET3"].".xml";
$file = ereg_replace(" ","_",$file);
$file = strtolower($file);
$error ="";
if(!eregi("<Exception>", $xml) and!eregi("Error", $xml))
{
if($fp = fopen($file,"w+"))
{
fwrite($fp, $xml);
fclose($fp);
chmod($file,0777);
//if(file_exists($file)) unlink($file);
//$error = "Success - ".$row["WIDGET1"].$row["WIDGET2"].$row["WIDGET3"].".xml\n\n";
$count++;
}
} else {
$error = "---------------------------------------------------------------------------------------\n\n";
$error .= $row["WIDGET1"].", ".$row["WIDGET2"].$row["WIDGET3"]."\n";
$error .= "$tmp\n\n";
$error .= "Error: \n $xml\n\n";
$error .= "---------------------------------------------------------------------------------------\n\n";
}

fwrite($log, $error);
echo "<br>".$error."<br>";
}

echo "Retrived - ".$count;
fclose($log);

ob_flush();
?>

gosman

8:15 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



I think it also uses this function from the included cache.php

function CurlConnect( $requestURL )
{
// create curl handle (this will be used for all curl functs
if(eregi("xml=",$requestURL))
{
$url = split("xml=",$requestURL);
$url[1] = urlencode($url[1]);
$url = implode("xml=",$url);
} else {
$url = $requestURL;
$url = ereg_replace(" ","%20",$url);
}

$ch = curl_init($url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //return results inline
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

$response = curl_exec($ch);

if(curl_errno($ch))
{
$response = "<Exception>".curl_errno($ch)." - ". curl_error($ch)."</Exception>";
}
else
curl_close($ch);

return $response;
}

jezra

10:39 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



As far as I can tell, your script does the following:
1. gets a list of URLs
2. loops through the list
2.1 Curls the next item in the list
2.2 writes the returned text to a file

When your script downloads the xml, the script does nothing until the download is complete. Once the download is finished, the script begins the next download. If running multiple instances of the script speeds up the process, then you should look into either making multiple scripts, or reconfigure your script in such a way that multiple files are downloaded at the same time.

gosman

10:47 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



Thanks Jezra.

Running multiple instances of the script does speed things up but is a nightmare trying to manage as a cron job.

What I need to know is it definitly possible to configure the script in such a way that multiple files are downloaded at the same time.

jatar_k

11:50 pm on Mar 23, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



don't know if this actualy the answer but is worth a look

[php.net...]

another thought is controlling the whole process with php but maybe using shell scripts ofr the grunt work

jezra

12:44 am on Mar 24, 2006 (gmt 0)

10+ Year Member



this "should" download more than one thing at a time:

$result = mysql_query("select * from DATABASE");
//change to the download directory
chdir("/Download/directory/path");
while($row = mysql_fetch_row($result) )
{
//create a URL and filename based on $row data
$myURL = URL from $row data;
$myFile = name based on $row data.".xml";
$command = "curl $myURL -o $myFile &";
shell_exec($command);

}

gosman

12:58 am on Mar 24, 2006 (gmt 0)

10+ Year Member



Thanks Jezra for your help.