homepage Welcome to WebmasterWorld Guest from 54.211.7.174
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
iterating through csv to find lines different from another csv
tec4




msg:4502439
 11:34 am on Oct 1, 2012 (gmt 0)

I'm currently iterating through lines in a CSV file to check if each line exists in a second CSV file and if it does not, I store the line in an array so I can prune the "mis-matched" items out of my system.

My Problem: This is working fine with smaller CSVs but now that I am working with two CSVs over 85,000 lines, my CPU is spiking and being used 70-85% on this single script and is taking a tremendous amount of time to finish. I am wondering if there is a better way of going about it to make what I am trying to do more efficient.

My Code:

//Two CSV files
$csv = "data.csv";
$csv_local = "local_data.csv";

//Parsing CSV data into arrays
$feed_info = parseData($csv);
$local_info = parseData($csv_local);

//Parsing Function

function parseData($csv_file){
$file_pointer = fopen($csv_file, "r");

$array = array();
while($line = fgets($file_pointer)) {
$array[] = trim($line);
}
return $array;

}



//Store mis-matched lines in array

if(count($feed_info) > 1 && count($local_info) > 1){

$mis_match_array = array();

foreach($local_info as $info){

if(!in_array($info,$feed_info)){

$mis_match_array[] = $info;

}
}
}


Can't think of a better/less resource intensive way of going about this - any thoughts?

Thanks!

 

coopster




msg:4507995
 7:19 pm on Oct 14, 2012 (gmt 0)

Maybe array_diff and/or array_intersect?
[php.net...]

swa66




msg:4508009
 9:24 pm on Oct 14, 2012 (gmt 0)

I guess memory usage is going to cause your system to trash

You could try to load just one file , and parse the other line by line without loading it all in memory (parse it yourself) , that should cut down your memory usage by about half.

penders




msg:4508486
 4:36 pm on Oct 16, 2012 (gmt 0)

I would certainly look at the array functions, as coopster suggests.

Another alternative is to store just one file "data.csv" as the keys of an array (not the values), then step through "local_data.csv" line by line (don't read into memory in its entirety) and check for its presence in the array using isset() - this is much more efficient than using in_array().

tec4




msg:4509908
 4:58 pm on Oct 19, 2012 (gmt 0)

Okay, awesome. Thank you all for your ideas and input. I'm going to give it a go later today and see how it improves.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved