homepage Welcome to WebmasterWorld Guest from 54.204.59.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Local / Foo
Forum Library, Charter, Moderators: incrediBILL & lawman

Foo Forum

    
opening a really, really, really huge text file. how?
httpwebwitch




msg:4103420
 12:34 am on Mar 24, 2010 (gmt 0)

I have this file... it's called "errors.log". It's my PHP error log file.

I haven'e looked at it in a while... oops...
now it's 194,136,958 bytes

and I only need to look at the last few lines of it.
Any text editor I try to open it with crashes before I get a peek inside.

How can I look at a really huge txt file?

any tools that can split it into smaller, bite-sized portions?

 

eelixduppy




msg:4103426
 12:45 am on Mar 24, 2010 (gmt 0)

In linux?

tail is probably the best for this, and I actually use it all the time to monitor the error logs.


tail -f error.log


The -f is to follow the file, if anything was added to it while you are viewing it. Really useful.

grandpa




msg:4103464
 2:37 am on Mar 24, 2010 (gmt 0)

If you haven't tried Textpad yet, see if it works.

tangor




msg:4103470
 2:51 am on Mar 24, 2010 (gmt 0)

That's a large text file. Notepad will open it if you have enough memory on your machine... but will take a gosh awful long time to get'er done. I managed to open a 150mb logfile... Would suggest a log rotation/save at 50mb if you are managing your own log files. All my sites are set to rotate once a week (smaller sites) to once a day (more active sites). None of mine are in the hourly rotate category (NASA size, etc...). All that so it would not take forever and a day to open a log file.

Other option, and does not take that much time, is to dump into Access or MySQL then nav to the bottom of the list. Works a charm.

smallcompany




msg:4103471
 2:52 am on Mar 24, 2010 (gmt 0)

Notepad++ is free - not sure about max file size it supports.

UltraEdit - works with huge files, I see it says 4GB. I remember my coworkers used it for data management of exports from big databases. Not free, but free trial may work for you.

claus




msg:4103487
 3:43 am on Mar 24, 2010 (gmt 0)

On Windows: Super Note Tab / Note Tab Pro / Ultra Edit

graeme_p




msg:4103538
 6:51 am on Mar 24, 2010 (gmt 0)

Linux (or Unix with GNU utilities installed) as a "split" command that can put a specified number of bytes or lines (which is what you want here) into each output file.

You can prevent the problem from occurring by piping output to a utility that creates a new file for each day. I cannot remember, but there was an example on the lighttpd website for splitting access logs that way.

mack




msg:4103542
 6:58 am on Mar 24, 2010 (gmt 0)

what about reading it server side? php fgets might be up to the job.

Mack.

MatthewHSE




msg:4103725
 2:03 pm on Mar 24, 2010 (gmt 0)

I was also going to suggest using PHP. However, if you really need it on your PC, EditPadPro ought to do it.

mcavic




msg:4103808
 3:37 pm on Mar 24, 2010 (gmt 0)

PHP would be fun. You could seek to the end of the file and read backwards until you find the number of line feeds you want, then read it forward.

Or perhaps more efficiently, you could estimate the number of bytes you want to read from the end of the file, and seek directly to that point.

bakedjake




msg:4103813
 3:43 pm on Mar 24, 2010 (gmt 0)

install mysql on your machine, import it into a SQL database

or use excel with a delimiter import

or tail -f :)

grandpa




msg:4104051
 8:30 pm on Mar 24, 2010 (gmt 0)

or use excel with a delimiter import

Won't excel stop at 65535 rows?

I agree with previous php suggestions as the most likely to produce rapid results.

mcavic




msg:4104058
 8:53 pm on Mar 24, 2010 (gmt 0)

Actually, tail is indeed the fastest, and you can get a Windows version, though I haven't tried it in Windows.

[edited by: lawman at 11:34 pm (utc) on Mar 24, 2010]
[edit reason] No Search Terms Please [/edit]

bakedjake




msg:4104157
 12:43 am on Mar 25, 2010 (gmt 0)

Won't excel stop at 65535 rows?


2k7 is fine with large files. i think it supports something like 1M rows now.

tail is indeed the fastest


yup, useful if you want to view only very quickly.

the syntax is:

tail -[num] file

where [num] is the number of lines you'd like to look back starting from the end of the file, and file is the file name

if you'd like to split the files, you can also use tail (amongst other utilities):

tail -1000 bigfile > newfile

takes the last 1000 lines from bigfile and puts them into newfile.

also useful:

tail -f bigfile

which will show the last 10 lines of bigfile and also keep an open pipe so you can see new messages as they come in. useful for debugging webservers, firewalls, etc.

Marcia




msg:4104274
 7:43 am on Mar 25, 2010 (gmt 0)

There's a specific free "splitter" utility (it's an .exe) you can download for splitting huge files in Windows. It's a no-brainer, I've used it for breaking up very big datafeeds.

httpwebwitch




msg:4104389
 1:00 pm on Mar 25, 2010 (gmt 0)

thanks everyone! I tried a file-splitter shareware app, and it worked. I ended up with over 3,000 little numbered files with 1000 lines in each. Quite digestible :)

I may try UNIX "tail" or SQL next time. But more significantly, I found the PHP bug that was producing the errors, so my error.log file isn't growing any more.

Incidentally, it was an undefined index warning, such as $array['key'] where 'key' didn't exist in $array; when echoed, the result is an empty string which is what I intended, but it also threw an error into the log at the same time. Just one of those buggers, combined with PHP error logging, massive loads of traffic, and me not paying attention... lesson learned!

wheel




msg:4104400
 1:11 pm on Mar 25, 2010 (gmt 0)

Definitely tail.

You can also do
tail -1000 error.log
which gives you the last 1000 lines of the file. If you want to pipe that to another file you can do this:
tail -1000 error.log > otherfilename.txt

then you can work on otherfilename.txt.

Replace 'tail' with 'head' and you get the same thing but on the top of the file, so
head -1000 error.log
gives you the first 1000 lines instead of the last.

In any event, there's no need for php scripting when you're doing text file processing on a linux machine. Command line stuff will do everything you need plus more, if you know the commands.

httpwebwitch




msg:4104480
 3:18 pm on Mar 25, 2010 (gmt 0)

... yet more evidence that I should be switching to Linux someday

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Local / Foo
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved