| 12:11 pm on Feb 19, 2010 (gmt 0)|
Are you asking this question from a developer point of view or from the user's point of view ?
Since if it is from a developer point of view, my personal opinion is if you are going to use flat files, the coding becomes that much more extended.
| 1:27 pm on Feb 19, 2010 (gmt 0)|
hello, i would like to know from a SERVER ADMINISTRATO e developer point of view. I would like to know if information based on flat-file (quite static) could be better performance that driven with db. thanks
| 6:14 pm on Feb 20, 2010 (gmt 0)|
Yes, it could be. But a database could give better performance too.
Basically you are in one of three situations:
1. You're redesigning a site that already has lots of traffic and you need to be able to handle that traffic from day one. In that case, identify your peak load, try out a CMS with representative dummy data (many CMS have features or add-ons to do this for you for just this purpose) and then stress test it with something like Apache Bench or equivalent and see if you can handle the load and what sort of optimizations you'll need.
2. You're building a new site and you're worried about scale even though you don't actually even have traffic. In that case, I strongly recommend you look around the web for what Joel Spolsky and Guy Kawasaki have to say about building for scale before you need to (in brief: don't).
3: You're redesigning a site that's existed for years and gets 30,000 page views per month, spread out pretty evenly and traffic has been steady for two years. In that case, you don't really care one way or the other, because the average shared hosting account will serve this up without issue whether it's static HTML pages or a relatively slow CMS.
Personally, assuming you're not talking about a site with tons of real-time traffic like Twitter or Google Wave, I think you can get most of the advantages of the flexibility a DB provides with most of the speed advantages static files provide using a DB-driven site that caches static versions of pages or page components.
As for the original question, think of if this way in order of increasing complexity
1. Each page is being designed separately and there's no dynamic component at all. In this case, there's not much need for a CMS at all
2. You want basic templating, but you're not really managing content via your site. In this case, you're just looking at some server-side includes in the language of your choice (or just using the Apache module for SSI). Clearly, flat files will win - no need to even create a DB connection.
3. You want something dynamic, but few complex relationships. So you want to grab an article and grab all comments related to that article, but without grabbing profile information for each author. In the fastest possible implentations, I think flat files would win. WebmasterWorld works a bit like this - each thread is just a file. I don't know the implementation details, about how it grabs your post counts and such, but AFAIK it does not, for example, create a DB record for each post in a thread, it just appends the post to the thread file.
4. You want more complex relationships. For example, let's say you want to show users who comment on a thread, all other commenters who have also commented on other threads that I have commented on. You could do this with tons of text processing, but it makes the head spin. It would likely be a fairly costly query to the DB too, but much faster to extract the data and it would be vastly simpler to design and maintain.
Also, a well-designed database (i.e. normlized) can handle queries that you've never thought of at design time without any change to the DB. The same is true for a flat file system of course, but if you're asking for something that isn't built in and foreseen, it might take hours to crunch through all the data.
| 7:11 am on Feb 22, 2010 (gmt 0)|
Hello ergophobe, thanks once again for your extend reply :-) your replies are very useful thanks once again.
Have a nice day kgibbo
| 7:30 pm on Feb 22, 2010 (gmt 0)|
You're welcome. In looking over it again, I think the very last point may be the most important one. If you've got the data and it's normalized, you can process it in ways you may not even think of until three years down the road.
| 7:51 pm on Feb 22, 2010 (gmt 0)|
Thanks, my case is actually "2. You're building a new site" i am studying DB MS SQL and i will do also some stress test as you adviced me.
Thanks for your time
| 8:21 pm on Feb 22, 2010 (gmt 0)|
Actually, my advice for #2 was to just build it and worry about scale later. To quote Alistair Cockburn from Agile Software Development "Before you have users, itís a waste of time ensuring that they can always get to the service".
My advice in #2 was build, see if you get any traffic and hope and pray that down the line, scaling to demand becomes a "problem" for you.
| 8:56 pm on Feb 22, 2010 (gmt 0)|
>My advice in #2 was build, see if you get any traffic and hope and pray that down the line, scaling to demand becomes a "problem" for you.
Yeah, you haven't lived until your serving ads at $50 cpm, you make big press, and your db goes down.
| 10:23 pm on Feb 22, 2010 (gmt 0)|
>>your serving ads at $50 cpm,
Well, that's a special case. I did almost say that if you were planning to buy a lot of ads and go for a big launch, that would be different.
I guess if you know your ad spend and you have clue what your CTR will be, you'll have an estimate of traffic and you would stress test to... uh... some multiple of that value.
I'm just thinking that I've seen people on these forums delay launch for 2 years while figuring out which solution will scale for the massive traffic they're going to get. They finally launch and a year after launch they have a few thousand visitors per month.
| 11:21 pm on Feb 22, 2010 (gmt 0)|
Static files (and cached DB content in static files) have one advantage: Independence of the SQL server.
With a DB-driven site, if the SQL server goes down, so does your site.
This is especially relevant on shared hosting, where lots of sites share the same server. Invariably, at some point, someone is going to overload that server, even if you don't.
| 7:03 am on Feb 23, 2010 (gmt 0)|
hello, regarding " Independence of the SQL server" this could be achieved using PAGE STATIC CACHE for examlpe in asp.net framework? But I suppose this can take alot of RAM, am i correct? thanks guys
| 3:58 pm on Feb 23, 2010 (gmt 0)|
I don't know enough about ASP, but AFAIK the page static cache is likely to have relatively aggressive garbage collection. It's not going to keep your cached copy in memory for months is it? I've seen cache expiry times of one hour more typically. So if you get a Digg or something, it will keep the page in memory.
Claus and I are talking about actually capturing the page output when a page is created or first requested and just making a copy on the file system. You set up your rewrite rules to look first for the static copy and if the file exists, you just serve it up.
Depending on your site, you may or may not need garbage collection. I have a very heavy front page on one site, but it's utterly static until I change it, so I just refresh the cached version when it changes. If you have user-generated content, you'll need a cron job (or MS equivalent) and check your cache expiry and expire or refresh the pages outside the limit.
| 4:42 pm on Feb 23, 2010 (gmt 0)|
yes asp.net 3.5 has cache functionality similar to cron job in unix, garbage collection could be activate based on time, changing of others element linked.
For my solution would make sense update cache once for day.
Thanks once again for your help.
| 4:42 pm on Feb 23, 2010 (gmt 0)|
PS, could you post me some link regarding caching even if it is not ASP to better understand concept and strategy if you have any... thanks ! :-)
| 5:47 pm on Feb 23, 2010 (gmt 0)|
Actually, I can't think of anything right now. For my roll-your-own caching, I just used PHP output buffering [php.net].
- grab the request URL
- check whether /cache/requested_url exists as a file
- if not, start page generation.
- invoke output buffering and capture output to buffer, rather than outputting as created
- page generation complete - output to user
- grab buffer contents and save as /cache/requested_url
- flush buffer
-every night, run cron job that looks for expired files and expire the cache (i.e. delete the file).
Also you need to set it up so that when you update a page, it looks for a file in the cache and clears it (or overwrites it and creates a new cached copy).
I can't say I thought real hard about this setup or read up on caching or anything like that. I just knew that output buffering was available and decided to use it like that because it was the first and simplest way that occurred to me. I'm sure there are better ways.
| 9:52 am on Feb 24, 2010 (gmt 0)|
thanks once again for your infos :-)