Forum Moderators: coopster
I have been mulling over the best way to manage a large set of user preferences efficiently over a cluster. The site will handle about 3 million users, and around 3-4k concurrent users, each with a large and variable length set of preferences.
I was thinking I might use the database to hold a text field of preferences in an INI format. But I was wondering of I would gain anything by doing it this way rather then just keeping everything separated in database tables, and running joins and such as needed.
I can assume that the data will be queried frequently, and a lot of content presentation is determined by these preferences, so they need to be accessed in a hurry if the site is to perform well.
Any ideas, advice, references, or stories of experience in this area is much appreciated
-Greg
The thing to remember is to keep indexes on any and all columns you're going to be using withing WHERE and JOIN clauses--autoincrement fields are indexed automatically I believe, but you may not wish to use these in every case.
In one case, a person I've done some work for told me a story of a client he had that hired him to fix his site. He had a cron job set up to run once an hour to rebuild a directory page. He had several hundred thousand records in this table, many of which had massive quantities of data requiring a fair amount of parsing. This cron job had grown larger and larger until it took an hour and 15 minutes to complete the job... which, as you'll recall, was set to run once an hour. Not a good situation.
So, my friend went into the database tables, and to his horror found that not only was every row referenced by a varchar name field, but there were no indexes anywhere in the database!
Half an hour and a liberal application of indexing later, the cron job was run again, and completed in a client-astounding 6 minutes (still slow, you say? Keep in mind that each of these records was, in essence, used to build an entirely unique mini-website, with several pages). The only downside to the application of the indexing was about 6-10 mb of database storage used for something other than data.
The moral of the story: indexes can literally save your bacon when you're dealing with large amounts of data. Never create a database without them. :)
I think I understand you, but I wonder if more facts would change your advice?
I think the way my site architecture would turn out, it would be a single query to return the entire profile as an ini, and parse it in PHP, or else several different queries over the life of the request to generate the info as needed.
Single query yields only a single database call yet provides more information then needed for any single request.
And several queries yields many database calls but only the information needed for the request.
As far as running cron to keep the INI fields set, I was planning on keeping all the data normalized in the database for data mining, but whenever a user changed a preference, I would trigger a routine to rewrite the INI field in the quick retrieval database.
Which would you think is more efficient in terms of scalability, as well as maintenance and performance?
Again I appreciate all the feedback I can get on this.
-Greg