Forum Moderators: phranque
Try this: create a temporary table in a database, and when you run your program have it open the file and only read in the lines and dump each line as a record in your temporary table. Then use sql queries to do your searches. When done, drop the table. This will probably speed up your process signigficantly.
On the other hand, it is easier to make corrections and changes to interpreted-language programs, because the program need not be re-compiled after each round of changes.
Jim
First of all, processing any large file is typically going to be a time-consuming operation. You might first want to consider whether doing this online is even appropriate. This type of processing is more typically done in a "batch" process or at least off-line. If you do decide that it still makes sense to do it offline, you may need to change PHP timeout values to give the process enough time to complete. (PHP limits the maximum time a script can take.)
I believe, as well, it is possible to use PHP outside of the context of a webserver, though I'm not familiar with the details.
Changing languages is unlikely to speed-up the processing significantly (or even noticibly). If the speed can be improved, it's much more likely that it will be accomplished by selection of appropriate algorithms and data structures.
If you do decide to do this off-line, then PHP may not be the best choice, save for one important benefit: you already know PHP.
If you do switch languages, I'd be less concerned with how "fast" the language is, and more with how close a match the language and available libraries are for the task. Perl is awfully nifty for any kind of reporting or analysis. And although, slower than molasses, Ruby is also a great choice for this sort of application. (I'm a recent convert to prototyping in Ruby, then rewriting selectively in C++ if necessary.)
I have increase the PHP timeout values but the execution is still too slow.
I will also look into REBOL and Ruby. Looks like another learning process.
Thanks for all the advice.
Depending on whether or not you are in complete control of your environment, you have these basic choices:
Perl [perl.org]
Python [python.org]
Ruby [ruby-lang.org]
Java [java.com] (JSP)
Microsoft platforms have a number of choices in addition to these, but I am not so familiar with them.
These are all system scripting languages, not Web server languages, like PHP. You communicate to the Web server though a CGI glue layer.
I wouldn't recommend trying other languages, as it can be very hard to get support for them, as well as expect a hosting environment to support them.
I've done just about every kind of programming that you can imagine, including punching raw hex machine code into an EEPROM with an ICE, and designing fairly complex C++ systems.
I would suggest looking for libraries and tools in your hosting environment that could help you out; not just the language you are using. For example, UNIX/Linux environments have a couple of tools called SED and AWK [oreilly.com]. These can often be invoked through system interface calls from other languages.
I would also suggest Perl and Java as a couple of your best bets. Java gets "precompiled," so it is similar to many compiled languages, but the learning curve is just as steep as with C. Perl is a complete bear to program, but is easily the most powerful text processing language out there. I hate it, and look for other avenues. Python is usually precompiled as well, but is not always as well-supported as other languages. It is also a bit strange if you are coming from a more traditional language.
Analyze what type of information you need from the logfiles. There is probably some repetition in the tasks and most time is probably consumed by a small part of the algorithm. Now create a task that does as much as possible pre-processing on the files. I.e. filtering records you don't need, sorting the data, or dump the data to an SQL database which has proper indexes to the information you normally need.
If you don't need hard real-time data, you could run this pre-process job from a crontab and perform only the latest step in your on-line php session.