Welcome to WebmasterWorld Guest from 54.80.97.221

Forum Moderators: open

Message Too Old, No Replies

Large database and speed issues

     
2:12 pm on Apr 2, 2009 (gmt 0)

New User

5+ Year Member

joined:Apr 2, 2009
posts: 3
votes: 0


So I built a member management system for the company I work for. Right now we have 32,493 members with 69,269 sales to 89 different vendors (all this data and MUCH more is in the database). Now we have phone numbers, fax numbers, and emailaddresses per contact and each member has multiple contacts. We have different identifier numbers that we can use to match our member id's to the member id's that our vendors give us in reports of sales. We also have addresses, zip codes, cities, states to keep track of where our members are from, and we also keep track of every modification to a member's information with the help of a changes table that is growing really quick. Now the problem with all this is that when I built the system, we put in about 5000 records to use as testing, and once the whole system was built and tested it all worked great and pretty fast. But after we dropped the rest of all the data in, it has slowed down alot. It may take 30 seconds for some of the bigger searches and queries to return. We have kinda just been dealing with it but now I have a problem. I have to run a file that is retroactively generating a report from July 1st 1999 to June 30th 2008. The problem is that report has to run once for every single day (thats 6359 days) Thus I get to run a SQL statement that normally takes about 25 seconds to return by itself, 6359 times. The SQL statement is as followes:

SELECT
members.Member_ID, companies.CompanyName, vendors.VendorName, sales.SalesAmount
FROM members
Inner Join companies ON members.Company_ID=companies.Company_ID
Inner Join sales ON members.Member_ID=sales.Member_ID
Inner Join vendors ON sales.Vendor_ID=vendors.Vendor_ID
WHERE (SELECT MAX(SalesDate) FROM sales WHERE Member_ID=members.Member_ID)='".$date."'
GROUP BY members.Member_ID;

"$date" is my variable for the current day that the loop is on

any have any ideas on what I should look up or research to speed this process up or to speed the database up?

///////////////////////////////////////////

I tested the last Select statement, it took 91 seconds to run through Navicat. I then tested it on a duplicate of our database that is on my personal server (took 13 seconds). The difference you ask? The server my company uses sucks and I have had plenty of experience with them sucking but I can't change that. However, after cutting the statement down, I got it to run on my companies server in 7.5 seconds: the new select statement:

SELECT
members.Member_ID
FROM members
WHERE (SELECT MAX(SalesDate) FROM sales WHERE Member_ID=members.Member_ID)='2009-02-03'
GROUP BY members.Member_ID;

2:47 pm on Apr 2, 2009 (gmt 0)

Administrator

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
posts:12548
votes: 2


Welcome to WebmasterWorld, sfishel87.

Proper indexing is the key. Most RDBMS come with some type of query analysis tool that will help you optimize the query. Usually this means adding index(es). Analyze the query and go from there.

3:29 pm on Apr 2, 2009 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5753
votes: 121


I strongly second the index comment.

Throwing more memory at the server may also help (what is it now?). These days, memory is so cheap that your tightwad company may even go for it. :)

The problem is that report has to run once for every single day.

If you're not already doing it, you may want to run the query for each day and save the results to another table.
4:21 pm on Apr 2, 2009 (gmt 0)

New User

5+ Year Member

joined:Apr 2, 2009
posts:3
votes: 0


I've got every column that is searched against (anything we reference to try to match it against another value) indexed already. We are running the process to catch up to current time so from here on out the process just runs once a day and only collects date for today
3:12 am on Apr 4, 2009 (gmt 0)

Junior Member

10+ Year Member

joined:July 28, 2007
posts:125
votes: 0


What rdbms are you using?

If Postgres, lookup vacuum
If Oracle or Sql Server lookup statistics
If MySql look up Optimize table

What others have said about indexes. Also if you have big queries doing joins, see if you can copy that data to a denormalized table and query it from there. Joins can get expensive.

Can you tell if you are i/o bound, if so that may point to a memory issue, as in not enough. Do you have any stats on if it's finding these things in working memory, or if it's hitting disk. Disk is dog slow compared to memory.

Can you turn your queries into a stored procedure, or plsql or the like?

1:14 am on Apr 7, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 20, 2004
posts:882
votes: 2


If Postgres, lookup vacuum

That can be important but using EXPLAIN should be the first step.

3:09 pm on Apr 7, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member piatkow is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 5, 2006
posts:3440
votes: 52


Well I wouldn't call those numbers large by commercial database standards.

Its over 10 years since my DBA days when I was debugging issues like this several times a week but that sub-select in the query makes me twitch. Step 1 is always to use EXPLAIN or whatever local equivalent is available. That will track down the sort or table scan that is causing the problem.

An additional index may well be the answer. The other solution that we found was to split a complex query and insert the results of the first step into a temporary table to be read in step 2.

3:13 pm on Apr 7, 2009 (gmt 0)

New User

5+ Year Member

joined:Apr 2, 2009
posts:3
votes: 0


thanks for all the help guys
11:14 pm on Apr 21, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 20, 2004
posts:882
votes: 2


I've got every column that is searched against (anything we reference to try to match it against another value) indexed already.

And is the database system using even one of those indexes? You may need an index over more than one column. What does the planner say it's doing?

An additional index may well be the answer.

But don't just go throwing in another index without looking at the query plan you are trying to fix.

10:56 pm on Apr 30, 2009 (gmt 0)

New User

10+ Year Member

joined:Feb 26, 2006
posts: 39
votes: 0


All RDBMS's have one rule, an access through a table will use one index and only one.

Adding multiple indices will not help queries with complex where criteria. You need to create joined indices (i.. index like col1, col10, col5). the rdbms will use the colums in the correct order (so a select for only col1 will use the index and a select for only for col15 will not)

Its one of the biggest misunderstanding in database usage that an RDBMS will use more then one index for a complex query.

Just a kind FYI

[edited by: Rufal at 10:57 pm (utc) on April 30, 2009]

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members