Forum Moderators: open
I’ve got a problem dealing with massive databases in MySQL (currently, it has more than 50K records/20 fields and it’s continually increasing).
The requirement is to build a web based report generator that allows staff to query record base on specific fields’ value. However, when the query result is larger than 30K records, the processing time becomes quite slow. Especially so when I use PHP to export the database into CSV format – the process takes around 10 minutes, which could be affecting the system performance. (This database is still collecting the data in real-time).
The only solution that I could think of is to import the database to the local server and let staff use a database client to generate report locally. However, the data will not be in real time and I need to import it manually.
I don’t have a lot of experience in database area. Your help or pointers would be much appreciated.
Your problem could be:
lack of memory / swap filing
record locking
inefficient code
What else is going on in this box? What O/S? How much ram? What type of disks? Is the overall box old technology?
RAM is dirt cheap these days. Consider adding more but make sure you tune the my.cnf file.
Run some MySQL tuning performance checks on it. They will tell you what to optimise in the my.cnf file
I'd check the programming first,, then go add the indexes. Reason being, if the tables aren't indexed and you add indexes to the fields, it will speed it up, giving you the impression it's fixed. When it gets to 100K in a few weeks, it will resurface again.
After heading around this with my poor skill for long, your guys advises are really help me out of this hole.
Database
-----------
The problem was that PK and indexes from one reference table was not set.
After indexes are set, the query result is amazingly quick. It took only - 0.86 sec from command line (>10 min before).
I use MyISAM engine so I could not able to define FK. I'm not sure if it has any risks to alter engine to InnoDB and I'm not sure if there are any advantage of defining FK.
In addition, I'm still calculate AGE from date of birth in the query string (and put it to another field).
I'm not sure if this is a good practice or not.
By the way, the result on mysql command line is now really acceptable.
Coding
----------
I still have problem with exporting CSV using PHP.
There are 2 main loops that may effect the system.
First loop go through every records from mysql_query (40K)
Second loop go through every columns of each record to reform it to csv format.
Here is my code, $arrs is a set of mysql_result in array
function getCSV($arrs, $filename = "export.csv")
{
$csv_terminated = "\n"; //new line
$csv_separator = ","; // new column
$csv_enclosed = '"'; //quatation
$csv_escaped = "\\"; //escape quatation
$csv_content = "";
$columns = array_keys($arrs[0]);
// Put columns header, less than 20 times, I thinks it's not an issue
foreach($columns as $col) $csv_content .= $csv_enclosed . str_replace($csv_enclosed, $csv_escaped . $csv_enclosed, stripslashes($col)) . $csv_enclosed . $csv_separator; // create csv row, add seperator at the end.
$csv_content = trim(substr($csv_content, 0, -1)); // remove last seperater, then remove white space
$csv_content .= $csv_terminated; // new line
foreach($arrs as $arr):// First, loop through every records
foreach($columns as $col)// Second, loop through every columns
$csv_content .= $csv_enclosed . str_replace($csv_enclosed, $csv_escaped . $csv_enclosed, stripslashes($arr[$col])) . $csv_enclosed . $csv_separator;
$csv_content = trim(substr($csv_content, 0, -1));
$csv_content .= $csv_terminated; // new line
endforeach;
// export
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Content-Length: " . strlen($csv_content));
header("Content-type: text/x-csv");
header("Content-Disposition: attachment; filename=$filename");
echo $csv_content;
exit;
}
Server
-------------------
I think the server is fine. BTW, here is server reference:
top - 13:24:25 up 20 days, 12:46, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 120 total, 1 running, 118 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.6%us, 0.4%sy, 0.0%ni, 98.2%id, 0.2%wa, 0.3%hi, 0.3%si, 0.0%st
Mem: 4055304k total, 3222268k used, 833036k free, 263384k buffers
Swap: 524280k total, 0k used, 524280k free, 2396792k cached
i.e.
Instead of
foreach($a as $b):
$output .= ...
endforeach;
echo $output;
ob_start();
foreach($a as $b):
echo ...;
endforeach;
ob_end_flush();