Forum Moderators: coopster
My problem: i need to run a script to create some statistical numbers for products, from a database with more than 3.5 million records.
I have created 1 table (statistical) where i have inserted all the unique ids of the products (about 1500). This table has 4 more colums - each will hold a number for my stats.
My approach so far (sketch):
ob_start()
select id from statistical table table
{
check how many instances of this id are found in the large table (with any restrictions)
update statistical table with result
ob_flush();
flush();
ob_start();
}
ob_end_flush();
All selects have the fields names (no select * here)
The large table has 4 indexes of the fields i want to take my measurments.
yet the script takes like 7 hours to complete.
What can i do to improve this?
i have php 4.3 and mysql 4.x in server installed
table 2 (large one, 3.5M rows)
b_id // unique, auto-incr
a_id // the one from table 1
field1
field2
field3
field4
field1-4 and a_id are indexed in table2
one script is transfering all the a_id from table1 to a temp talbe
problematic script is running only in table 2, seeking instances of a_id and doing some calculations.
I have improved performance by removing the ob functions go figure!
Now script needs 45 min to be executed.
$starttime = microtime();
$startarray = explode(" ", $starttime);
$starttime = $startarray[1] + $startarray[0];
include "dbconnect.php";
$cur_year = date("Y");
$first_day_year = mktime(0, 0, 0, 1, 1, $cur_year);
$last_day_year = mktime(23, 59, 59, 12, 31, $cur_year);
$query2 = "select a_id from temp_table WHERE date_year='$cur_year' ";
$result2 = mysql_query($query2) or die(mysql_error().'<p>'.$query2.'</p>');
while ($myrow2 = mysql_fetch_array($result2))
{
$temp_a_id = $myrow2['a_id'];
$total_records = 0;
$query3 = "select a_id from table2 WHERE a_id='$temp_a_id' AND site='".$_POST['site']."' AND page_name='".$_POST['page_name']."' AND date_click BETWEEN '$first_day_year' AND '$last_day_year' ";
$result3 = mysql_query($query3) or die(mysql_error().'<p>'.$query3.'</p>');
$total_records = mysql_num_rows($result3);
$query4 = "UPDATE temp_table SET index_views='$total_records', site='".$_POST['site']."' WHERE a_id='$temp_a_id' AND date_year='$cur_year'";
$result4 = mysql_query($query4) or die(mysql_error().'<p>'.$query4.'</p>');
}
$endtime = microtime();
$endarray = explode(" ", $endtime);
$endtime = $endarray[1] + $endarray[0];
$totaltime = $endtime - $starttime;
$totaltime = round($totaltime,5);
echo "This page loaded in $totaltime seconds.";
Now this part will be in a loop and be executed 12 times (combinations of $_POST['site'] and $_POST['page_name'] variables.
2. I assume following index (1 index containing 4 fields) exists for table2: (a_id, site, page_name, date_click)
3. Not sure how good is BETWEEN optimized in MySql. On some RDBMS it is faster to run 2 queries instead:
- select count(*) .... AND date_click >= '$first_day_year'
- select count(*) .... AND date_click <= '$last_day_year'
4. I assume following index exists for temp_table (a_id, date_year)
table2
------------------------------------
id int(11) PRI NULL auto_increment
a_id int(11) MUL
site varchar(50) MUL
page_name varchar(100) MUL
date_click int(15)
user_ip varchar(20) MUL