Forum Moderators: coopster

Message Too Old, No Replies

Manipulating Large Volumes of Data

Is mySQL/PHP the wrong tools for the job?

         

Markos

5:24 pm on Sep 4, 2007 (gmt 0)

10+ Year Member



Hi,

I'm importing CSV files into a table in a mySQL database using PHP, with the number of rows being anything in the region of a seven hundred thousand.

Another PHP script reads through each row in the newly created table and applys some manipulation to the data and adds it to a "master table".

Now, this is all running locally on an XP installation of Apache 2 and these scripts are taking 20 plus minutes to run in some cases.

Do I have any other alteratives to using PHP to manipulate the data on mySQL? Is there some setting I be looking at for when large volumes of data are manipulated in mySQL/PHP?

I've had to repeatedly increase the timeout figure (in php.ini) to way beyond it's default amount - this would never run on a commerical host, they'd cut my fingers off for using up so much CPU time!

henry0

7:01 pm on Sep 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want looking into "index/es"
Add an index to any field that could be queried using:
WHERE
OR
AND
JOIN etc..

An index allows you to specify an index for a table thus speeding up your queries.
Primary keys are automatically indexed
however do not add an index to each and every col :)

LifeinAsia

7:11 pm on Sep 4, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Another PHP script reads through each row in the newly created table and applys some manipulation to the data and adds it to a "master table".

It depends on the actual manipulation being performed. In many (most?) cases, letting the DB do the work instead of PHP going row by row should result in quite a bit of savings.

For example, if all you're doing is adding 2 fields, the following would be lots faster than using PHP to loop through each row, perform the calculation, then insert 1 row to the DB:
INSERT MasterTable
SELECT Field1, Field2+Field3
FROM NewTable

Again, it all depends on the actual manipulation being performed.

borntobeweb

8:28 pm on Sep 4, 2007 (gmt 0)

10+ Year Member



Markos, do you have indexes defined on the tables you're inserting into? One easy way to speed up your process is to disable indexes on the table before you insert a lot of records into it, then enable it again at the end. This makes MySQL build the index in one giant step instead of little steps that may involve rebuilds.

But as henry0 mentioned, create/keep indexes for those table that you query...

There are other ways to optimize your process, including locking your table, doing the inserts in bulk, etc, have a look at Optimaztion: Speed of insert Statements [dev.mysql.com].