Forum Moderators: coopster & phranque

Message Too Old, No Replies

Truncate from beginning of file

Can this be done?

         

runner

10:11 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



I have a 2 Gb log file (text only) and I'd like to truncate X number of lines from the beginning of the file. This would roughly be 1% of the file. I want to do this on a daily basis. Only problems is that every truncate command I know of truncates from the end of the file.

adni18

10:44 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you could get how long 1% of the file would be by multiplying the length by 0.01; then you could use substr() for the actual cutting

runner

11:10 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



I'm not sure that would do me any good. I want to throw away the first X number of lines and keep the other 1.99 Gb.

runner

11:35 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



OK... I see what you're saying. I wonder how perl will handle a substring that is 1.99 Gb in size?

I think I just need to figure out how to move the "beginning of file" file descriptor X number of bytes. That way I'm not reading a giant file into memory.

adni18

11:39 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm pretty sure perl would not rewrite the entire file, minus the 1st few lines, but delete the 1st few lines.

lexipixel

11:45 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Most languages / file systems don't let you truncate the beginning of file, so all you have to do is:

Read all lines in, reverse, truncate, reverse again (to set back in proper order), write out file.

#
open (LOG,"20050103.log");
seek (LOG,0,0);
@lines = reverse <LOG>; # reverse read (LIFO)
close (LOG);
#

Now you have the lines in reverse order.

Simple matter to chop off the "end" (which is the beginning) then reverse the lines again and you have truncated the biginning of the file by "X" number of lines.... write it back to disk.

Maybe someone else knows a way that less convoluted...

runner

12:20 am on Jan 4, 2005 (gmt 0)

10+ Year Member



I'd be curious to know what it would do to the system to read in an array that was 2 Gb! I'm going to try it tonight when volume is low.

Hanu

12:43 am on Jan 4, 2005 (gmt 0)

10+ Year Member



Don't do it in memory. That's insane. Copy to a temp file, then rename:

open( IN, $logfile );
open( OUT, ">$logFile.tmp" );
my $i = 0;
while( <IN> ) {
if( $i++ >= $xNumberOfLines ) {
my $buf;
while( 0 < read( IN, $buf ) ) {
syswrite( OUT, $buf );
}
last;
}
}
close( IN );
close( OUT );
unlink( $logFile );
rename( "$logFile.tmp", $logFile );

Note to mods: Why oh why can't we indent our code? Drives me nuts.

coopster

12:54 am on Jan 4, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I use the [pre] tags and spaces to indent when displaying code. Use two spaces for every one you want indented though.

sun818

1:08 am on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's probably a unix command for it already. I'm sure you're not the first one who wants to erase x number of lines from the beginning of a file. I'd do a search on log file trimming or something like that since a command or script would be most often used for such a purpose.

runner

1:44 am on Jan 4, 2005 (gmt 0)

10+ Year Member



I've been google searching for several hours. I can't believe nobody has done this before or there isn't a shell command to do it. I even opened a ticket with Sun Microsystems and they said I can't do it. I'll bet I can though. Well... if worst comes to worst, I can process each line of the file and copy it over to another file excluding the lines I want to delete (like Hanu pointed out). Not sure how long it will take my system to process 2 Gb of data though. I just hate to do that since moving a stinking file descriptor is all I need to do. I think I'll google for a few more hours though.

Hanu

11:37 am on Jan 4, 2005 (gmt 0)

10+ Year Member



>I've been google searching for several hours.

Me too. Not hours, but a long time.

>There's probably a unix command for it already.

No there isn't. Unix filesystem semantics simply doesn't support this use case. It can only append and remove data blocks from the the end of a file. It would be great for a lot of reasons if there were a system call for this.

Romeo

12:45 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



AFAIK, Hanu's approach in msg #8 is the only way to go:
read the file line by line, writing each line to a 2nd file while just leaving out those lines at the beginning you want to get truncated.

Similar approaches, which of course also read the file line by line to write a new one:

If you know the number of lines to truncate: delete first 87 lines and print last section of file from line number 88:
sed '1,87d' infile > newfile
or
cat ¦ awk 'NR > 87'

If you can locate the truncate point by a regexp (a date, for example): print all lines after the first occurance of a regexp:
sed -n '/regexp/,$p'

Regards,
R.

runner

3:18 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



Thank you all for your input. I tried Hanu's and Romeo's approach and they both work. Unfortunately, % iowait goes through the roof during the copy. If this file system were on a seperate controller I think I could live with it but I can't affect the other users that much. I'll have to bag it for now.

sun818

5:21 pm on Jan 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How about this? Have not tested the I/O impact though:

Trim 1.0 - Removes a number of lines from the beginning of a file
[pc-tools.net...]

Hanu

5:26 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



Same thing. It uses a temp file like romeo's and my solution. It's probably even slower because it copies line-wise instead of block-wise.

runner

6:16 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



The other bad thing is that when I "copy" the 2 Gb log file over to the temp log file during processing, I'll have to have enough space to hold 4 Gb. That's no problem for now but I don't know about the future.

Hanu

6:31 pm on Jan 4, 2005 (gmt 0)

10+ Year Member



If you have control over the log file, you might want to split it once in a while. The process that writes the log needs to close the log file, rename it and open a new one. The cleanup process simply needs to delete the oldest log file. But again, this only works if you control the log writer.

Also logrotate might be worth a look.

VectorJ

4:24 am on Jan 6, 2005 (gmt 0)

10+ Year Member



I don't have my Perl book with me so I can't give you a complete answer, but look up the 'read' command in the Perl book. You can set the cursor position within the file using 'read' and then bring in the file starting at the byte you want.