homepage Welcome to WebmasterWorld Guest from 54.161.236.92
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderators: physics

Databases Forum

    
Count across tables
Using MySQL to sum across similar tables
Moosetick




msg:3647359
 3:57 pm on May 11, 2008 (gmt 0)

I've seen posts with similar questions, but none that get me where I need.

I'm got tables created for each day, 20080501,20080502,20080503,etc. All contain 3 columns with the same names. I want to count the total number of occurances across multiple tables and get the top results.

20080501 20080502 20080503
-------- -------- --------
c1 c2 c3¦¦¦¦c1 c2 c3 ¦¦¦¦c1 c2 c3
1 a red ¦¦¦¦1 d green¦¦¦¦1 a yellow
2 b blue¦¦¦¦2 e red ¦¦¦¦2 c red
3 c red ¦¦¦¦3 f blue ¦¦¦¦3 f red

So I want a result like:
c1 c2
red 5
blue 2
green 1
yellow 1

I've tried something like...

select column3, COUNT(*) as Number from '20080501` GROUP BY column3 union
select column3, COUNT(*) as Number from `20080502` GROUP BY column3 union
select column3, COUNT(*) as Number from '20080503` GROUP BY column3 ORDER by Number desc limit 20;

But that doesn't combine the totals. Can anyone shed some light?

 

Gorilla




msg:3647406
 5:07 pm on May 11, 2008 (gmt 0)

Your data model is wrong. You should have data form all days in one table. Why the split into a table per day? You need a better solution to the problem that resulted in per day data split in individual tables.

Moosetick




msg:3647416
 5:11 pm on May 11, 2008 (gmt 0)

This is still in devopment so I can change, but I thought I could get faster queries by breaking the data up into daily tables.

Each table will have about 50000 entries and I want to keep 6 months worth of data. Can MySQL handle a table with 9 million rows well?

ZydoSEO




msg:3647443
 5:58 pm on May 11, 2008 (gmt 0)

I agree with Gorilla that your data model is incorrect. There is not really a good way to sum up those values across multiple tables. Plus the number of days in a month vary, so you are going to continue to run into complications when querying your dataset due to the fact that it's spread across all of these daily tables.

I'm a SQL*Server guy, but if MySQL can't handle 9 million rows then a) you're using the wrong DB or b) you might consider trimming back the amount of data you keep or archive off old data monthly to another instance of MySQL.

If you insist on using daily tables as you've described above, you might consider having a summary table with a transaction date, a value for field c3, and a count of how many rows appeared in the daily table for field c3. You could schedule a job to run after midnight to summarize the previous day's data and append the daily results to the summary table. So in the above example one 20080504 the summary table would look like:

TranDate¦Value.¦Count
20080501¦Red...¦2
20080501¦Blue..¦1
20080502¦Green.¦1
20080502¦Red...¦1
20080502¦Blue..¦1
20080503¦Yellow¦1
20080503¦Red...¦2

Now you have summary counts by color by day and everything is in a single table. Using this same table you can get them by week, month, or any date range you'd like by querying one table with a simple query.

[edited by: ZydoSEO at 6:00 pm (utc) on May 11, 2008]

Gorilla




msg:3647452
 6:13 pm on May 11, 2008 (gmt 0)

MySQL can handle the number of rows you are looking for, but you will probably want to create one more more indexes to support the queries you need to perform well.

I suggest you create a small script to generate a test dataset with the max number of records you expect. Then play around with queries and observe performance. The explain command should help you understand what indexes you need.

[edited by: Gorilla at 6:13 pm (utc) on May 11, 2008]

Moosetick




msg:3647469
 6:52 pm on May 11, 2008 (gmt 0)

OK, I followed Gorilla's advice. I created a 4th column with the data. It was performing well with 10 days worth of data without any indexes. I have also gone ahead and indexed 2 columns and hopefully that will offset any slowness issues.

If the queries start to take too long I may just create temp tables with daily stats.

Thanks for all your help.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved