Forum Moderators: coopster

Message Too Old, No Replies

speed up a script

         

omoutop

6:19 am on Jul 25, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi all.

My problem: i need to run a script to create some statistical numbers for products, from a database with more than 3.5 million records.

I have created 1 table (statistical) where i have inserted all the unique ids of the products (about 1500). This table has 4 more colums - each will hold a number for my stats.

My approach so far (sketch):

ob_start()
select id from statistical table table
{
check how many instances of this id are found in the large table (with any restrictions)
update statistical table with result
ob_flush();
flush();
ob_start();
}
ob_end_flush();

All selects have the fields names (no select * here)
The large table has 4 indexes of the fields i want to take my measurments.

yet the script takes like 7 hours to complete.

What can i do to improve this?
i have php 4.3 and mysql 4.x in server installed

d40sithui

11:56 am on Jul 25, 2007 (gmt 0)

10+ Year Member



wow 7 hours. thats crazy. so you have two tables that this thing is running. i jsut want to clarify this is what u have.

<table1>
id, //primary key
field1
field2
field3
...

<table2> //3.5 million rows
id, //not unique, will be used to match w/ table1
field1
field2
field3
...

omoutop

12:48 pm on Jul 25, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



table 1 (small one 1500 rows)
a_id // unique, auto-incr

table 2 (large one, 3.5M rows)
b_id // unique, auto-incr
a_id // the one from table 1
field1
field2
field3
field4

field1-4 and a_id are indexed in table2

one script is transfering all the a_id from table1 to a temp talbe

problematic script is running only in table 2, seeking instances of a_id and doing some calculations.

I have improved performance by removing the ob functions go figure!
Now script needs 45 min to be executed.

omoutop

12:54 pm on Jul 25, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



and yes i know its crazy, so here a small part of the script (cleaned up)

$starttime = microtime();
$startarray = explode(" ", $starttime);
$starttime = $startarray[1] + $startarray[0];

include "dbconnect.php";

$cur_year = date("Y");
$first_day_year = mktime(0, 0, 0, 1, 1, $cur_year);
$last_day_year = mktime(23, 59, 59, 12, 31, $cur_year);

$query2 = "select a_id from temp_table WHERE date_year='$cur_year' ";
$result2 = mysql_query($query2) or die(mysql_error().'<p>'.$query2.'</p>');
while ($myrow2 = mysql_fetch_array($result2))
{
$temp_a_id = $myrow2['a_id'];
$total_records = 0;

$query3 = "select a_id from table2 WHERE a_id='$temp_a_id' AND site='".$_POST['site']."' AND page_name='".$_POST['page_name']."' AND date_click BETWEEN '$first_day_year' AND '$last_day_year' ";
$result3 = mysql_query($query3) or die(mysql_error().'<p>'.$query3.'</p>');
$total_records = mysql_num_rows($result3);

$query4 = "UPDATE temp_table SET index_views='$total_records', site='".$_POST['site']."' WHERE a_id='$temp_a_id' AND date_year='$cur_year'";
$result4 = mysql_query($query4) or die(mysql_error().'<p>'.$query4.'</p>');
}

$endtime = microtime();
$endarray = explode(" ", $endtime);
$endtime = $endarray[1] + $endarray[0];
$totaltime = $endtime - $starttime;
$totaltime = round($totaltime,5);
echo "This page loaded in $totaltime seconds.";

Now this part will be in a loop and be executed 12 times (combinations of $_POST['site'] and $_POST['page_name'] variables.

joelgreen

3:13 pm on Jul 25, 2007 (gmt 0)

10+ Year Member



1. Why do you select all the records if you need count only?
I hope this would work faster (not sure):
$query3 = "select count(*) from table2 WHERE a_id='$temp_a_id' AND site='".$_POST['site']."' AND page_name='".$_POST['page_name']."' AND date_click BETWEEN '$first_day_year' AND '$last_day_year' ";
this will return records count ($result[0])

2. I assume following index (1 index containing 4 fields) exists for table2: (a_id, site, page_name, date_click)

3. Not sure how good is BETWEEN optimized in MySql. On some RDBMS it is faster to run 2 queries instead:
- select count(*) .... AND date_click >= '$first_day_year'
- select count(*) .... AND date_click <= '$last_day_year'

4. I assume following index exists for temp_table (a_id, date_year)

kaleb

6:34 pm on Jul 25, 2007 (gmt 0)

10+ Year Member



Can you please post the output of the following queries..

describe temp_table;
describe table2;

omoutop

1:48 pm on Jul 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



temp_table
---------------------------------
id int(11) PRI NULL auto_increment
a_id int(11)
index_views int(11)
site varchar(50)
date_year int(4)
order_app int(11)

table2
------------------------------------
id int(11) PRI NULL auto_increment
a_id int(11) MUL
site varchar(50) MUL
page_name varchar(100) MUL
date_click int(15)
user_ip varchar(20) MUL