Page is a not externally linkable
- Code, Content, and Presentation
-- PHP Server Side Scripting
---- iterating through csv to find lines different from another csv


tec4 - 11:34 am on Oct 1, 2012 (gmt 0)


I'm currently iterating through lines in a CSV file to check if each line exists in a second CSV file and if it does not, I store the line in an array so I can prune the "mis-matched" items out of my system.

My Problem: This is working fine with smaller CSVs but now that I am working with two CSVs over 85,000 lines, my CPU is spiking and being used 70-85% on this single script and is taking a tremendous amount of time to finish. I am wondering if there is a better way of going about it to make what I am trying to do more efficient.

My Code:

//Two CSV files
$csv = "data.csv";
$csv_local = "local_data.csv";

//Parsing CSV data into arrays
$feed_info = parseData($csv);
$local_info = parseData($csv_local);

//Parsing Function

function parseData($csv_file){
$file_pointer = fopen($csv_file, "r");

$array = array();
while($line = fgets($file_pointer)) {
$array[] = trim($line);
}
return $array;

}



//Store mis-matched lines in array

if(count($feed_info) > 1 && count($local_info) > 1){

$mis_match_array = array();

foreach($local_info as $info){

if(!in_array($info,$feed_info)){

$mis_match_array[] = $info;

}
}
}


Can't think of a better/less resource intensive way of going about this - any thoughts?

Thanks!


Thread source:: http://www.webmasterworld.com/php/4502437.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com