Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: open
I'm familiar with MySQL, but am willing to use Oracle or whatever is necessary. It may be unrealistic, but I'd like it to be responsive enough so that I can have a web based front end that can do dynamic queries on the data.
I've never had to deal with this much data. Can anyone provide suggestions for the best way to handle/organize the data?
Of course, that is based on being busy 24 hours.
If you want to actually have a period where you can do analysis, then you might want a bulk dataload that takes less than say, 12 hours. Now, MS SQL server can bulk load from text files at much greater rates than 350/second.
The advantage of MS SQL is that, at least the enterprise editions, allow you to mix and match between operational and data warehousing models.
A years worth of rows is about 11 billion rows. Big, but not huge. You just need to plan on the disk space in advance. Upsizing disk space for db's is not fun. Let's say 1GB stores 1 million rows with indexes. That becomes 11,000GB, or 11TB. With fudge factor, 20TB.
If you want fast access, I think you are going to be using prebuilt data cubes.
mysql can't do much of the above. MS SQL can. At a cost. And no, the Express Edition won't cut it here. You will need whichever edition supports the sizes you want, and the OLAP capabilities.
Your budget is not going to be for the faint of heart.
For one thing, you won't be doing this on a bunch of sata drives. For another thing, the license is not going to be cheap. Finally, sql of any flavour loves cpu and memory. As much as you can afford.
Another alternative is to use an ISAM system. This will take a *lot* of bare metal programming. The limits of the ISAM I use is 16TB per db, but multiple db's are supported. In the right circumstances, it can be an order of magnitude faster.
Now, if you don't need the raw data, summarisation will make the whole thing much easier and faster. But, first you have to load the raw data. You might want to keep 90 days of raw data and summarise the rest.
For web analytics, the raw data is seldom useful after some period of time. The trending numbers are more important. And relationships are important.
Now, if you are doing stock quote analysis, text files can actually be blindingly fast. You just put one security in each file and append the daily numbers.