I'm currently iterating through lines in a CSV file to check if each line exists in a second CSV file and if it does not, I store the line in an array so I can prune the "mis-matched" items out of my system.
My Problem: This is working fine with smaller CSVs but now that I am working with two CSVs over 85,000 lines, my CPU is spiking and being used 70-85% on this single script and is taking a tremendous amount of time to finish. I am wondering if there is a better way of going about it to make what I am trying to do more efficient.
I would certainly look at the array functions, as coopster suggests.
Another alternative is to store just one file "data.csv" as the keys of an array (not the values), then step through "local_data.csv" line by line (don't read into memory in its entirety) and check for its presence in the array using isset() - this is much more efficient than using in_array().