PHP/MYSQL with 3 million rows - PHP Server Side Scripting forum at WebmasterWorld - WebmasterWorld

Forum Moderators: coopster

Message Too Old, No Replies

PHP/MYSQL with 3 million rows

Imaster

9:20 pm on Nov 25, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I plan to use several million rows (to be more specific, around 3 million) in mysql. The script that would be used would be coded in php.

The end queries won't be very complex, they would be normal queries. To check the speed of the program I got a script which calls all the results, with 50 results on one page. When I tested it with 100,000 rows (& 10 columns each row), the speed was phenomenal. But when I used 3 million rows and same number of columns to simple print all results, with 50 results on 1 page, it took a staggering 30 seconds to process and caused a huge load on the server.

Any suggestions on how to tweak the performance? I expect lots of queries to the database per minute and it is very important that the final result is very fast and very less resource intensive. I have seen many guys do it with different combinations, so any suggestions. I know a lot can be tweaked using SCSI drive + lots of RAM + a good CPU. Currently, I am on EIDE drive with 512 MB RAM

... Did it take a long time because the php script may have been coded in a different method?

... Is MYSQL/PHP combination good enough to handle such huge data?

... If not, what frontend should I use?

If anyone has experienced anything similar, it would be great to learn a few things.

jatar_k

2:40 am on Nov 26, 2003 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

I have worked with a few db's that were upwards of 2 million rows and haven't had much trouble. I admit though that I haven't tried displaying them all at once either.

Something to try might be to use PHP Output Control Functions [ca.php.net] then it isn't outputting as it is processing. It may speed it up.

print all results, with 50 results on 1 page

You wouldn't need to select more than 50 rows then if you are only showing 50. Take a look at MySQL SELECT Syntax [mysql.com] and look specifically at LIMIT.

If that is what you are doing I can give you some examples of how you might go about using LIMIT with the offset to find the proper rows and only ever select/process the rows you need.

DrDoc

3:10 am on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Yes, LIMIT is great. That way you can grab only the first 50 (LIMIT 0, 50) or a random 50 anywhere (LIMIT 856344, 50) :)

dmorison

7:08 am on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You haven't mentioned indexes in your post - make sure that any field used in the "WHERE" clause of your query is indexed.

Also consider building a caching mechanism. If the data isn't changing all the time then you can create static HTML files of "recorded" output and include these rather than hit the database each time. You can update the cache files either on a time basis (i.e. is the cache greater than 10 minutes old? If "Yes", then rebuild) or by deleting the cache file when another part of your application has updated information in the database that would make the cache go out of date.

daisho

8:02 pm on Nov 26, 2003 (gmt 0)

10+ Year Member

What about some type of dynamic caching? Cache the results in some type of temp table or a text file based on the search.

ie in the base I've created a table similar to the following:

sess_id varchar(100)
table_key bigint

sess_id I stored a PHP Session ID and the table_key I stored the primary key value for matches.

This works well when you have to search a subset of 3m rows and then itirate through them. First page does the intensive query in the form of insert into tblcache select ...

Then you just going your cache table with your primary table based on your session id.

This is a little complicated but I've run into a situation when queries were very very slow because of the where clause and a large amount of data and indexing didn't speed things up enough to make the system "peppy". I did this type of solution and things got better since each additional request from the same client uses a much simpler join rather than a complicated where clause.

daisho