Forum Moderators: open

Message Too Old, No Replies

Database Design

         

petachok

1:50 pm on May 23, 2008 (gmt 0)

10+ Year Member



Hello,

I was wondering if someone could help me with a design of a database. Everyday I run certain scripts that update the database with certain information. This information is recorded in my tables with a "date" attached to every entry. The problem with this is that this information is usually the same from day to day, and is just copied in my tables with a new "date" field.

Although this is working for me right now, as you may imagine, the database fills up relatively quickly. I was wondering if it is possible to "track" the changes to the current data everyday, without having to copy the non-changing data everytime.

I, however, want to be able to run queries on the data at any point in history. For example, I want to be able to view the "status" of the database in say Dec 1st, 2005.

If you guys have any suggestions please let me know,

Thanks

physics

4:06 pm on May 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It sounds like what you want is a bit of database normalization.

Say your tables are now:

locationanddate
-----------
place,date
-----------
us,20080520
us,20080521
uk,20080522

A normalized version of this would be:

location
--------
id_location,place
1,us
2,uk

and a second table

location_dates
----------------
id_location,date
----------------
1,20080520
1,20080521
2,20080522

In this example it looks trivial, but if you have a lot of information rather than just us/uk then it will save space, etc.

Use joins to get at the data.

[databasejournal.com...]

[dev.mysql.com...]

petachok

4:51 pm on May 23, 2008 (gmt 0)

10+ Year Member



Hey,

I think I am normalizing the tables, but the problem is slightly different. Take this for example:

user_info table:
user_id
name
info
etc. etc...

location table:
location_id
location

user_location:
user_id
location_id
date

for the user_location table I need to keep history of the locations of the user for each date, and need to be able to query it at any time.

Since there are a lot of users that I update everyday, that table grows very very quickly.

Thanks for the help

petachok

4:53 pm on May 23, 2008 (gmt 0)

10+ Year Member



Oh and just to clarify, for the last table, the locations for each user don't change often on a daily basis. Which is why I mean a lot of the info is copied.

physics

5:36 pm on May 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



petachok, that looks like a good structure to me :)

petachok

5:49 pm on May 23, 2008 (gmt 0)

10+ Year Member



Yes that's what I thought too at first. However that table grows by 3000 entries a day. Right now, querying that table is taking forever.

Perhaps I am getting the primary key / index structure for the tables all wrong. Would you be able to suggest to me which fields to index, which to make primary / foreign keys, etc.

Maybe that will make my queries faster.

Thanks for the help

physics

7:18 pm on May 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What SQL query are you running right now? Can you paste it here?

petachok

8:03 pm on May 23, 2008 (gmt 0)

10+ Year Member



That is not my exact table structure, but I am running something similar to:

SELECT name, location FROM user_info a, location_table b, user_location c WHERE a.user_id = c.user_id AND b.location_id = c.location_id AND date = 'somedate' AND name='somename'

leadegroot

11:43 pm on May 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That should be fairly quick - the queries shouldn't take a while to run, although, yes, disk space could start to be an issue.
If performance were the main worry, I would examine the indexes of the tables to make sure the queries are efficient. Have you run an EXPLAIN on the query?

To address the storage size issue -
You could make the user_location table hold columns:
user_id
location_id
from_date
to_date
but it would make your queries more complex and I have had query performance issues with this sort of structure before

petachok

4:24 pm on May 25, 2008 (gmt 0)

10+ Year Member



Thanks for the help guys.

I also came across this idea:

Have a base table with the "current" data, and a history table that logs all the changes. For example:

Base Table:
pk
data

History Table:
pk
timestamp
oldvalue
change (update, delete, or insert)

When you update, delete, or insert records in the base table, the history table is modified.

The problem I see with this is querying the data. Say you want to query data that is 3 years old. Would it be possible to restore the base table (or create a temp table) to the same status it was 3 years ago and then run queries against it?

Would this idea be inefficient?

Any suggestions are appreciated, Thanks.