Forum Moderators: coopster
I am attempting to write a script to parse an XML file and update a large number of rows in a MySQL database. Now, I have written this to do that and it works but it is kind of slow and it may be an acceptable time but it seems to high to me.
while ($rs=mysql_fetch_row($query))
$pProd[$rs[0]]=$rs[0];
$sSQL = "SELECT optName2 FROM options";
$query=mysql_query($sSQL);
while ($rs=mysql_fetch_row($query))
$pOpt[$rs[0]]=$rs[0];
// Loop through the XML file and build the Insert String
$sSQL = "INSERT INTO tbl_duplicate (ID, Stock,Type) VALUES ";
$bAddComma = false;
$lProd[] = '';
foreach ($xml->Available as $avail){
$bFound = false;
if (in_array($avail->Part, $pProd)===true){
$bFound = true;
$part = $avail->Part;
$iType = 0;
}
if ($bFound===false){
if (in_array($avail->Part,$pOpt)===true){
$bFound = true;
$part=$avail->Part;
$iType=1;
}
if ($bFound===false){
if (in_array($avail->Part, $pProd)===true){
$bFound = true;
$part=BasePart($avail->Part);
$iType=0;
}
}
}
if ($bFound===true){
// Verify that the ID isn't already in the list
if (in_array($part, $lProd)===false){
if ($bAddComma) $sSQL .= ",";
$sSQL .= "('" . $part . "'," . $avail->Qty . "," . $iType . ")";
$bAddComma = true;
$lProd[]=$part;
}
}
}
// Execute SQL String
mysql_query($sSQL) or die(mysql_error());
// Now update the products and options table using this data
$sSQL = "UPDATE products, tbl_duplicate SET products.pInStock=tbl_duplicate.Stock WHERE products.pID=tbl_duplicate.ID AND tbl_duplicate.Type=0";
mysql_query($sSQL);
$sSQL = "UPDATE options, tbl_duplicate SET options.optStock=tbl_duplicate.Stock WHERE options.optName2=tbl_duplicate.ID AND tbl_duplicate.Type=0";
mysql_query($sSQL);
Now, the problem with the speed does not appear to be the SQL queries themselves. It appears to me that the big bottle neck is taking the time to determine if the SKU received from the XML file is in the products table, options table, or neither.
The XML file typically has around 18,000 SKU's of which around 9,000 are in the DB in either the products or options table. It is averaging around 8 minutes to run this script and it just seems way to long to me for what it's doing. It is spending around 15 seconds downloading the XML file, 30 seconds performing the query and a little over 7 minutes parsing that.
Thanks for any thoughts.
Stewart
I'm guessing the calls to in_array on such large arrays are pretty slow. Since you key the arrays by their values, you could replace the calls to in_array by isset, e.g.:
if(isset($pProd[$avail->Part])) { ...
Suggestion 2:
Is tbl_duplicate empty at the start of the script? You may be better off inserting all the SKUs initially with Type set to NULL, then update Type based on what matches in the products and options tables, e.g.:
INSERT INTO tbl_duplicate (ID, Stock) VALUES (..., ...);
UPDATE tbl_duplicate SET Type = 0
WHERE ID in (select pID from products);
UPDATE tbl_duplicate SET Type = 1
WHERE ID in (select optName2 from options);
DELETE FROM tbl_duplicate WHERE Type is NULL;
This way you don't have to fetch all the existing parts and compare against them.
This part might be a bit tricky (does it ever get executed?):
if ($bFound===false){
if (in_array($avail->Part, $pProd)===true){
$bFound = true;
$part=BasePart($avail->Part);
$iType=0;
}
}
You may need to add an extra field in tbl_duplicate and add an extra UPDATE to match BasePart against pID in the parts table.
Hope this makes sense. :)
Thanks for the reply. I can see your points on some of it and I was figuring the in_array was the big cause of the slow down. Unfortunately when I tried to call isset it was throwing errors although I am not quite sure why because I was figuring it should work. Yes the third check does get thrown on occasion because in the database some of the product SKU's have been changed to match only the base name because all but one option was discontinued.
I will give your suggestions a try and see where I end up. Thanks :)
Stewart