Forum Moderators: open

Message Too Old, No Replies

Issues surrounding large sets of data

How big is too big?

         

musicales

5:21 pm on Jan 30, 2006 (gmt 0)

10+ Year Member



I find in-depth manipulation of stats packages quite a challenge, so I started trying to collect the data in a sql table - the fields are just integers containing session id, page id. Of course adding data to the table each time a page is viewed means the table is filling up fairly quickly. There's over a million rows already, and if I leave it collecting data for some time it will be into the 10s of millions.

What is the point at which this becomes unmanageable (I'm on sql server 2000, but I guess the question applies to other systems too). Given the data is numeric only, it seems to be manipulateable fairly fast even with this size of data, but could I end up with 40 million rows, 100 million rows? Does anyone have any experience of this kind of size?

aspdaddy

6:45 pm on Jan 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is no upper limit, it depends on the indexing and query optimisation.

Something I learned when writing these kind of programs is to use the datamart model (fact table & star schema) instead of normalisation.

musicales

8:06 am on Jan 31, 2006 (gmt 0)

10+ Year Member



aspdaddy - thanks. Can you point me to any good introductions / resources about datamart?

aspdaddy

5:48 pm on Jan 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check out articles by Ralph Kimball and if you can get hold of it his book titled the datawebhouse toolkit its worth a read.

The second-normal-form star schema vs third-normal-form snowflake schema is the database theory best for data analysis.

There is also ETL software to extract , transform and load data from relational databases,I wont say more else I violate the TOS if you follow :)

syber

7:05 pm on Jan 31, 2006 (gmt 0)

10+ Year Member



If you are on SQL Server 2000 you have access to Analysis Server.

Analysis Server will allow you to take your SQL database and covert it into a datamart cube using a star schema. There is a tutorial and cube wizard to help you get started. You will be able to easily handle many billions of rows this way.

musicales

7:30 pm on Jan 31, 2006 (gmt 0)

10+ Year Member



are these cubes then accessible to scripting languages in the same way as other sql tables?

aspdaddy

8:07 pm on Jan 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sure, there are also SQL extentions you can use specifically for dimensional models. Check out ADOMD & OLAP programming.