Forum Moderators: coopster
I am splitting up one huge text file in several smaller files to prepare them for database processing. So far I tested it with files up to 100MB and everything works fine.
Now here's my question: To optimize my script I would like to incrementally remove all lines form the original file which have already been copied to the smaller file.
This would speed up reading the original file in each loop.
Unfortunatelly I couldn't find a way to remove/delete lines from an open file. Of course I could read the whole file, delete the lines and then write it back. But for files bigger than 1 GB this doesn't seem to be a good idea....
Can anybody help? Alternatively I would also be thankful for a suggestion about how to efficiently process huge text files.
Greets,
lars_stecken
You'll find moderate discussion [webmasterworld.com] on the forum about this topic, but not much detail. You can flock the file, process, and write it back out entirely or move your information to a database management system. Outside of that, I look forward to hearing any other alternatives.
Since I cannot change the format (file) the data is delivered to me, I have no choice but to cope with these huge files :-(
I'll read up the thread you pointed me to anyway.
Thanks again,
lars_stecken