Forum Moderators: phranque
I am writing a content management system that will allow me to develop new websites more quickly and efficiently. This is something that is new to me and I'm a little worried about how efficiently it will work in the longer run.
I am trying to decide between using an included file that contains associated arrays versus using mysql databases. Every page on my site is built with the same template that has no specific information included, just css-positioning structure & php variables (i.e. <? echo $keyword?>, <? echo $description?> etc.). Each page contains a unique key variable that I then use to fill in the correct content from a central location.
Now at current, I am simply including an associative array inside an .inc file that contains information for the entire site. The array contains all the specific information for each page on the site.
Would it be better to use an sql table to include relevant content? Certainly it would allow maintenance to proceed much more easily. Assuming a site starts getting hit with large amounts of traffic which system will hold up better? That is, I need to choose between 1000s of simultaneous requests for the same file versus the equivalent number of simultaneous mysql queries.
Thanks,
Chris
I wanted to build a templated website so that I can make all the changes from one central location. Namely, instead of modifying individual html or php files, I wanted to include everything through an administration file. That way I can change shared resources (i.e. link tables, copyright, etc.) across the whole site while inserting dynamic content where applicable. It's basically just content management tailored to my own needs.
What I was concerned about was how to do this most efficiently. In the interim since I made that post I've learned that database driven sites are better than file driven. They are more popular so support is more widely available, easier to maintain than messing with arrays, and also able to handle heavy loads when used on the proper server.
In terms of what has been suggested to me, I was told use mySQL to start. Queries only take about 0.1 seconds so lower bandwidth sites should be able to handle this easily. What kills mysql in the longrun is opening and closing the connection to the database (which can take upto 1 second). As a result, the most efficient use of sql is through connection persistence where you use sessions to keep the database open until the visitor is finished browsing your site.
"Connection pooling" was also mentioned to improve efficiency but I haven't had time to look into yet. I'm guessing this has to do with using multiple databases although I'm not sure.
Last point, once the site outgrows mySQL, which of course is my ultimate goal, you can move into an oracle or SQL server solution.
From the profiling I've done, the opening of the database connection has never taken longer then around 0.05seconds, around 50ms, never been a problem.
It sounds like you use a TCP/IP connection to a remote server, try running the webserver on the same machine as the MySQL server and use pipes for the connection.
SN
Somebody will tell you about using a cache to avoid hitting the db, but this is just creating an extra layer in your system. This will save you time and money in the long term. It will also give you more flexibility in your layout and page design, as well as UI.
Do you know where I can find any statistics based upon speeds of programming queries versus DB queries. For instance, how long would it take to run through an associative array of 1000 items versus a SELECT statement using sql?
It wouldn't take too much more programming to simply output the sql into the file that's going to include the relevant page information. Is it really worth it to do so though? What are the largest database based sites doing, i.e. ebay and others?
Additionally, you are adding more resource usage. An assocaite array is almost just a big blob of memory, which is about as fast as you can get. A db is an enourmous resource that has lots of complicated data structures, may use network bandwidth (if on a separate box) and is a single point of failure for distributed websites.
The way most big sites work is:
personalization stuff is in a db of some kind, highly optimized and often requiring lots of programmers. Common stuff, such as catalog listings, message boards, etc, are usually spit out by a cron job, message queue, or equivalent into static script.
Sites that require real time updates such as etrade will build hugely complicated db/caching schemes to make sure that they don't kill too much performance.
Rule of thumb - dbs are slow. Don't use them unless you have to for scalable sites. What works for 1000 visitors a day won't work for 100,000.