Forum Moderators: open

Message Too Old, No Replies

Large database and speed issues

         

sfishel87

2:12 pm on Apr 2, 2009 (gmt 0)

10+ Year Member



So I built a member management system for the company I work for. Right now we have 32,493 members with 69,269 sales to 89 different vendors (all this data and MUCH more is in the database). Now we have phone numbers, fax numbers, and emailaddresses per contact and each member has multiple contacts. We have different identifier numbers that we can use to match our member id's to the member id's that our vendors give us in reports of sales. We also have addresses, zip codes, cities, states to keep track of where our members are from, and we also keep track of every modification to a member's information with the help of a changes table that is growing really quick. Now the problem with all this is that when I built the system, we put in about 5000 records to use as testing, and once the whole system was built and tested it all worked great and pretty fast. But after we dropped the rest of all the data in, it has slowed down alot. It may take 30 seconds for some of the bigger searches and queries to return. We have kinda just been dealing with it but now I have a problem. I have to run a file that is retroactively generating a report from July 1st 1999 to June 30th 2008. The problem is that report has to run once for every single day (thats 6359 days) Thus I get to run a SQL statement that normally takes about 25 seconds to return by itself, 6359 times. The SQL statement is as followes:

SELECT
members.Member_ID, companies.CompanyName, vendors.VendorName, sales.SalesAmount
FROM members
Inner Join companies ON members.Company_ID=companies.Company_ID
Inner Join sales ON members.Member_ID=sales.Member_ID
Inner Join vendors ON sales.Vendor_ID=vendors.Vendor_ID
WHERE (SELECT MAX(SalesDate) FROM sales WHERE Member_ID=members.Member_ID)='".$date."'
GROUP BY members.Member_ID;

"$date" is my variable for the current day that the loop is on

any have any ideas on what I should look up or research to speed this process up or to speed the database up?

///////////////////////////////////////////

I tested the last Select statement, it took 91 seconds to run through Navicat. I then tested it on a duplicate of our database that is on my personal server (took 13 seconds). The difference you ask? The server my company uses sucks and I have had plenty of experience with them sucking but I can't change that. However, after cutting the statement down, I got it to run on my companies server in 7.5 seconds: the new select statement:

SELECT
members.Member_ID
FROM members
WHERE (SELECT MAX(SalesDate) FROM sales WHERE Member_ID=members.Member_ID)='2009-02-03'
GROUP BY members.Member_ID;

coopster

2:47 pm on Apr 2, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld, sfishel87.

Proper indexing is the key. Most RDBMS come with some type of query analysis tool that will help you optimize the query. Usually this means adding index(es). Analyze the query and go from there.

LifeinAsia

3:29 pm on Apr 2, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I strongly second the index comment.

Throwing more memory at the server may also help (what is it now?). These days, memory is so cheap that your tightwad company may even go for it. :)

The problem is that report has to run once for every single day.

If you're not already doing it, you may want to run the query for each day and save the results to another table.

sfishel87

4:21 pm on Apr 2, 2009 (gmt 0)

10+ Year Member



I've got every column that is searched against (anything we reference to try to match it against another value) indexed already. We are running the process to catch up to current time so from here on out the process just runs once a day and only collects date for today

CrustyAdmin

3:12 am on Apr 4, 2009 (gmt 0)

10+ Year Member



What rdbms are you using?

If Postgres, lookup vacuum
If Oracle or Sql Server lookup statistics
If MySql look up Optimize table

What others have said about indexes. Also if you have big queries doing joins, see if you can copy that data to a denormalized table and query it from there. Joins can get expensive.

Can you tell if you are i/o bound, if so that may point to a memory issue, as in not enough. Do you have any stats on if it's finding these things in working memory, or if it's hitting disk. Disk is dog slow compared to memory.

Can you turn your queries into a stored procedure, or plsql or the like?

eeek

1:14 am on Apr 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If Postgres, lookup vacuum

That can be important but using EXPLAIN should be the first step.

piatkow

3:09 pm on Apr 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well I wouldn't call those numbers large by commercial database standards.

Its over 10 years since my DBA days when I was debugging issues like this several times a week but that sub-select in the query makes me twitch. Step 1 is always to use EXPLAIN or whatever local equivalent is available. That will track down the sort or table scan that is causing the problem.

An additional index may well be the answer. The other solution that we found was to split a complex query and insert the results of the first step into a temporary table to be read in step 2.

sfishel87

3:13 pm on Apr 7, 2009 (gmt 0)

10+ Year Member



thanks for all the help guys

eeek

11:14 pm on Apr 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've got every column that is searched against (anything we reference to try to match it against another value) indexed already.

And is the database system using even one of those indexes? You may need an index over more than one column. What does the planner say it's doing?

An additional index may well be the answer.

But don't just go throwing in another index without looking at the query plan you are trying to fix.

Rufal

10:56 pm on Apr 30, 2009 (gmt 0)

10+ Year Member



All RDBMS's have one rule, an access through a table will use one index and only one.

Adding multiple indices will not help queries with complex where criteria. You need to create joined indices (i.. index like col1, col10, col5). the rdbms will use the colums in the correct order (so a select for only col1 will use the index and a select for only for col15 will not)

Its one of the biggest misunderstanding in database usage that an RDBMS will use more then one index for a complex query.

Just a kind FYI

[edited by: Rufal at 10:57 pm (utc) on April 30, 2009]