| This 41 message thread spans 2 pages: 41 (  2 ) > > || |
|Security: flat files VS databases|
Vulnerabilities to hacking
The first thing most people think of when talking about flat files VS databases is a ghost discussion about speed & features (mysql, postgreSQL, etc). The intention of this thread is not that, it's security. The main difference about flat file databases is one has to build the system and features, but let's focus on security.
Mysql/Postgresql-Databases and alike:
- Code and data are in separate places, it's too much bad luck to have them both exposed.
- It's not like "set up and go" you need to secure both areas.
- You might have great code security while having your database failing on that, or just exposed.
- It can be the opposite: having your data altered while your code is intact.
- It sucks trying to work with git or whatever version management, you have full control of your code but data needs extra steps.
- And how much time you need to recover from failure (backups and restoring it).
Flat file databases:
- Code and data are files you can see, copy, etc. You can have them on the same place, disk or wherever you want to.
- Amazingly nice to manage versions and backups.
- So, the same file access & security implementation works on both areas (code and data). No need to secure extra areas.
Let's not forget the many CMS and commercial software solutions relying on databases (like WP, Joomla, etc.) being hacked. Drupal (my fav choice) has better history and reputation because at the end a lot depends on code implementations, anyway if your file security is not good, even Drupal will fail.
I coded my own CMS (several) and have been quite happy on security and specially performance. Critical applications were in "competition" against WP and other custom enterprise solutions, I didn't have any problems while can't say the same about the other options. Over the years I think it all comes down to how secure your server is, file permissions, shared hosting and your code implementations (filtering code, cleaning ALL data input, etc).
While I'm happy with my CMS's I'm now working on my full Perl framework focusing on the key points and challenges I see at work every day. This will solve a lot of problems at the office in no time. Some frameworks allow you changing one line of preffs to switch between mysql and postgresql. I will try to do the same but aiming database and flat file databases.
Anyway I'm researching on security, any comments will be appreciated. Been doing fine with security for years regarding websites 100% in Perl and getting 4K daily uniques visits.
Thanks in advance.
(if any doubts about speed, my systems have won the race over the other options being used, so it's not on the table for discussion).
I don't know about Postgres, but every other DB I've ever worked with does have files for the databases- files that you can see, copy, etc.
|data are files you can see, copy, etc. You can have them on the same place, disk or wherever you want to. |
One thing that flat files do NOT have is transactional logs. That means if your system crashes and you lose your flat file, you've lost all the changes since your last backup. Similarly, to make a backup, you have to backup the entire file.
For DBs, a system crash means you only lose the data since your last transaction log backup (which is commonly done much more often than a full backup). And transaction log backups are much faster than full backups, taking less time and consuming fewer resources.
Although this is not security-related in the sense of being backed, it IS security-related in the sense of securing your data (and recovering your data in the event of being hacked).
I'm not an expert in databases, so please correct me if I'm missing something. My understanding is that hackers are looking for specific signatures that flag a site as having a particular vulnerability. It can be a file, it can be "powered by" link. Because hacking is often a push button affair of scale, eliminating signatures can foil them (that's a band aid, I know). But more relevant, rolling your own CMS will create an essentially anonymous presence that may not have a signature that is being crawled for by a button pusher. So am I correct in asserting that the advantage doesn't lie in the file being flat or otherwise but in the quality of being idiosyncratic?
I agree with martinibuster, a custom CMS does give you a huge advantage against automated attacks, less if you are targeted.
CMSs like Wordpress and Joomla are not really designed to be secure: this is not just a matter of the standard install being secure, but also providing a base that makes plugins etc. easy to secure. Drupal will be much better once the new version using Symfony is released.
A common failure is a DB abstraction layer that does not impose/encourage security. Things like escaping strings used in queries should be done by default, not reliable on the developer remembering to do it every time.
You can have RDBMs data in the same location as the code: its a matter of configuring the database - or just use SQLite (you did say we are not considering performance and features?).
You can (and usually should) configure your database so it cannot be accessed directly over the internet: so it can only be attacked through another process on your web server, which means its attack surface is no larger than that of flat files.
Unless your application is running as a different user from the web server your flat files will be open to attack through anything else you have running as that user: so, for example, if you have apache and two sites running your custom CMS all running as www-data a flaw in apache or either site will allow your data to be tampered with. Yes, good configuration fixes this, but that is extra work (and not possible on shared hosting).
|if any doubts about speed, my systems have won the race over the other options being used, so it's not on the table for discussion |
Its faster than things like Wordpress? You would have to try quite hard to write a custom CMS that is slower than Wordpress!
|It sucks trying to work with git or whatever version management, you have full control of your code but data needs extra steps. |
I usually do not want the data version controlled, and if you do write a dump/reload script, or use a framework that has one. I do want the schema version controlled, but that is easy with a modern framework with some sort of support for migrations.
|The main difference about flat file databases is one has to build the system and features |
The problem with that is that you will NEVER match the features and reliability of a database. how ACID is your flat file database? My experience of converting from flat file to RDBMS is that I found a lot of errors that using a database would have stopped - e.g. e.g duplicates in lists of similar pages which a unique index would have prevented. What about querying as flexibly as SQL?
To a large extent you are asking "if we ignore the biggest advantages of using a database, then how does an RDBMs compare to flat files"?
Thanks for the updates.
|LifeinAsia: I don't know about Postgres, but every other DB I've ever worked with does have files for the databases- files that you can see, copy, etc. |
True! I would just add that not all the developer teams have access to all the areas of the server or use git (or alike) altogether.
|Martinibuster: So am I correct in asserting that the advantage doesn't lie in the file being flat or otherwise but in the quality of being idiosyncratic? |
I believe that's part of the barrier/benefit, any custom CMS will have that advantage. I remember an old thread about this (related) and Brett explained how WebmasterWorld was built and how the speed was achieved using flat files.
Yes hackers look for specific vulnerabilities:
- CMS: brand or tool footprints
- Server: file permissions
- Code: careless data input
- DB: injection opportunities
- DB config: open ports, flaws on DB versions
graeme: absolutely agree on security flaws many cms have, Drupal will indeed get better with SF (even that I'm not a big fan of it but it has saved me on getting things done several times)
Yes I also think DB leave space for vulnerabilities. Yes the databases can be put near the code but this is new to me.
Performance and features? I have achieved amazing speed on low resources so features are the only area left to consider.
YES! the file access is an open window (if left open of course) but once your code/files have been compromised is very easy to get the username and pass of the database if one knows the system, per example WP, even Drupal and Symfony because that data is indeed stored on a config file.
Yes shared hosting IS in fact a security risk., leaving file permissions aside I found out (by accident) I could see/read the file structure of neighbors using simple Perl scripts involving ../../ and nothing else (going up and up and up). A lot of files were readable so that was it.
Ha ha yes WP is a turtle.
|graeme: I usually do not want the data version controlled, and if you do write a dump/reload script, or use a framework that has one. I do want the schema version controlled, but that is easy with a modern framework with some sort of support for migrations. |
Oh yes but... working with teams makes it very, very difficult. Symfony allows exporting to fixtures but it has been very difficult (not to say impossible) to make others adopt Symfony. It's easier to adopt a database than a framework, and many times it's just not possible due to the company guidelines.
ACIDITI, pretty well. It's up to good practices, but not saying this or that one-man-band-tool is superior to msyql or alike. It's just that good practices over time (and being organized) end up with developers arranging their own frameworks (reusing code wisely). I used to see a lot of challenges but over time I got better code, faster code implementations.
|graeme: To a large extent you are asking "if we ignore the biggest advantages of using a database, then how does an RDBMs compare to flat files"? |
Nope. Apps depend on tools, code and implementations, a lot of apps won't use the whole set of features a database is offering, many apps can be designed to be fast and reliable (faster than common options).
The context I'm curious about is thoughts about security. Why? because (perhaps answering my own question) is: all systems have files and code so we are vulnerable to the same basically (the code).
Thanks for the comments, there are things I haven't considered.
Another thing DB systems are great for is granular security. With a flat file, a person has access to the file or doesn't.
In a DB, person A can read TableA, but not TableB. Or everyone can see ColA, ColB, and ColC of a table, but only certainly people can see all the columns (including ColD, ColE, etc.). Or anyone can write to a table, but only certain people can read (good for credit card info).
|once your code/files have been compromised is very easy to get the username and pass of the database if one knows the system, per example WP, even Drupal and Symfony because that data is indeed stored on a config file. |
That is no worse than flat files. You can also make life more difficult for the attacker by storing passwords in environment variables that are read into the running process - that way they need to be able to do more than just read files to get passwords.
|Nope. Apps depend on tools, code and implementations, a lot of apps won't use the whole set of features a database is offering |
I disagree. Unless a site is very simple indeed it will benefit from consistency guarantees and atomic transactions. Durability and isolation may be smaller benefits unless the system is large or busy.
The other feature is flexible querying - most apps change at some point.
|It's up to good practices, but not saying this or that one-man-band-tool is superior to msyql or alike |
MySQL was not a real RDBMS until recently - its default storage engine was MyIASM which did not support either transactions or foreign key constraints.
The problem is that if your code really ensures ACID you have effectively developed your own DBMS, rather than using something off the shelf that is well tested and widely used.
|It's easier to adopt a database than a framework, and many times it's just not possible due to the company guidelines. |
Whether your company guidelines allow you to use a framework or not is a different question from whether it is the best tool for the job.
Personally, I have no doubt that a database (usually an RDBMS) and a framework are the best approach for most web development. It works well for fairly small sites I have developed, and it scales all the way to the likes of Disqus and Pinterest.
After playing around with Flat-File DB EveryAuction-based Software for more than a decade, I figured that it would be Time to 'move up', by getting involved, using SQL-DB driven Opencart Shop Software.
To make it short, it has been a very painfull experience to find out, that the (popular) Opencart Software, by use of a MySql-Database, would not even be able to handle a few hundreds of Products without Delays of 30 Seconds and more to just produce one single Category-Browser Page.
Spoiled by superfast Server-Power and Masses of RAM, Programmers seem to have forgotten all existing laws of programming. As a result of such doings, Users see themselfs confronted with unhappy Hosters, advising them either to FIX their software or then risking of getting 'shut off' from Server Service, because of using to much bandwidth.
CMS, Frameworks, MySql-DB's, all this sounds good. But when it comes to speed, those 'things' may work on bug Systems, but not on small shared Servers. What counts there first , is lean and speedy Software, nothing else. And for such tasks, FlatFile DB still belongs to the best possible Solutions. There is nothing better or more secure.
You all have a good time
[edited by: phranque at 11:11 pm (utc) on Jul 14, 2014]
[edit reason] no personal urls please [/edit]
Erine, I have several database driven sites on cheap hosting without any issues. One of them even uses notoriously slow and inefficient Wordpress.
You CAN write slow systems using a database - you can write slow systems using flat files too. You can write fast, scalable systems using databases AND your data is a lot safer.
|For DBs, a system crash means you only lose the data since your last transaction log backup (which is commonly done much more often than a full backup). And transaction log backups are much faster than full backups, taking less time and consuming fewer resources. |
Reading this thread with a lot of amusement as WebmasterWorld is 100% flat file so I'll use WebmasterWorld as an example of the differences.
I've actually written databases from scratch, Btree* indexing I can code from memory, and have implemented a SQL on top of a dBase style database, so I've got a unique perspective on all of this.
Databases vs. flat files for speed is a silly argument because at the core of all databases is, guess what, FLAT FILES!
It's like arrays, you can have a flat index or a hashed index and both are better for different things.
Where the database wins hands down is if you need to do lots of ad hoc searches on very large sets of data.
However, you can always add an index or a search engine on top of flat files too, or load a file and sort it in memory, or shell the Linux SORT command to do it in the OS. Adding an index is how WebmasterWorld does it's forum index and active list, it indexes the flat files that are the threads. Also, the grep command can search 20K files in a second or two, so there are ways of quickly searching lots of flat files.
There are 100's of thousands of flat files here and the only time it slows down is during backups, which is the same as the database.
FWIW, if WebmasterWorld ever crashes, we don't really need to do a restore unless it's a hard disk failure so the rollback comparison is apples and oranges. When WebmasterWorld crashes we just reindex it, it doesn't need to be restored as all the files are still there intact.
As far as granularity locking, you can restrict parts of a file, and we do that. Whether it's a databse or a flat file, it's the software that provides that granularity of security. There are things only mods and admins can see and do per thread vs. everyone else so we have security and granularity.
FWIW, most people forget that flat files can be used like a database if you use fixed length records. Everyone thinks of flat files as just sequential read and sequential write, which is how most of them work.
So in the end, my only question when I'm building something is;
1. How big is the data going to be?
2. Do I need to do a lot of searches and sorts?
3. Will is be single or multi-user adding data?
5. Do I need open source that requires a DB?
For a CMS, using flat files is a great way to go IMO except you can't leverage a lot of 3rd party open source because everyone else uses that old MySQL crutch.
Seriously folks, until 1995, we didn't have anything like MySQL around for free. The alternatives were all paid memory hogs which we tended not to use and rolled our own. The other stuff was all dBase based and it stunk.
Really there's nothing wrong with flat files, it's a personal preference but if you're not familiar with multi-user programming, file locking, and all that, maybe you should avoid them as MySQL takes care of all those details for you.
Just make sure you've scoped the project properly and flat files truly give you everything you need, and you have the skills to implement it properly, before you proceed.
In other words, MySQL is the least path of resistance to getting it done quickly without worrying about all the file locking issues. However, with MySQL you better know how to use prepared statements and sanitize all your inputs properly or some hacker will make you his bitch.
Firstly, why do you pick the weakest database to compare against? MySQL is better than it used to be, but until very recently it was not a real RDBMS: as long as MyIASM was the default storage engine it did not support foreign keys and or transactions. It still does not support schema changes in a transaction. I rarely use MySQL.
What you are advocating is roll your own, so you put a lot of effort into writing your own database that is less well tested, less well designed, and has a smaller features set than one you can install for free. Why not write your own web-server, scripting language and OS while you are about it?
I rarely need to think about prepared statements and sanitising inputs: you should use a library or framework that does it for you. I also think it very likely that a home grown database has some parallel weakness to SQL injection - although it would benefit from security by obscurity.
I am not particularly impressed that WW uses flat files and has a subset of RDBS funcionality: what does it do better than an RDBMS based forum would have? I love WW content and community, but I have no reason to take it as a model of software development - especially as its developers cannot get some simple things right (like recognising that urls in a url tag can start with https:// ).
|ust make sure you've scoped the project properly and flat files truly give you everything you need |
So will an RDBMS, with a lot less effort on your part, and it will be a lot better tested.
We have not even discussed NOSQL databases yet....
|Firstly, why do you pick the weakest database to compare against? MySQL is better than it used to be |
It's so weak that AdSense, Facebook, etc. all use it, almost EVERYONE uses it, that's why it's called LAMP, but that's not the topic whatsoever.
The topic is Database vs. Flat files of why MySQL is a database.
|We have not even discussed NOSQL databases yet.... |
Why at the core is an index on a bunch of flat files. Joy.
A lot of big sites, especially newer ones do not use MySQL: Disqus, Instagram, IMDB, Skype, etc. I do not think they use the rest of the LAMP stack either.
Facebook and Adsense are living with technology choices made a long time ago. LAMP has worked so well for Facebook that they have developed their own version of PHP and are involved in developing a fork of MySQL.
Getting back to the topic, one comment and one question:
All of the above do use RDBMSs, supplemented by caching and NoSQL databases are appropriate. They do NOT use flat files.
The question is, what is the advantage of using flat files? What is the trade-off for the extra work and risk?
When you use a CMS, you get to take advantage of the code that's already written for that CMS. The down side is that security issues tend to be well known, and you'll be targeted. Also, many modules are written by novice developers who don't understand basic security precautions.
When you use a database, like MySQL, you again get the advantage that a lot of code has been written for you to handle relational data, etc. The same goes for NoSQL engines like Redis. However, you open yourself up to security hazards related to the way these databases work, which tend to be well known (SQL injection).
If you're a knowledgeable, security conscious developer, and you write your own cms and flat-file based data storage mechanism, the downside is that you have to do everything yourself. The upside is, you don't expose yourself to well known security holes (other than the ones in your web server and operating system). On the other hand, even very supposedly well written and battle-hardened code (see Heartbleed) can be hacked. However, if you've been careful the likelihood of you being hacked via the web is, imo, much lower than with a CMS based on a well known database.
With a CMS/Database, you need to keep up with updates for security and because stuff will break when you install/update another module or the core. If you wrote the code yourself and wrote it well, you probably don't need to make updates unless something breaks. On the other hand, you're the only one responsible for doing updates, you get no free lunch from open source devs on the web.
The other big issue is speed. Many sites don't need all of the capability a modern RDBMS provides. Depending on how you implement things, flat files can be insanely fast compared to a RDBMS. We're talking about being able to host a popular site on one or two servers as opposed to sixteen.
A downside to roll-you-own code is, especially if you're paying someone like an employee to do it for you, is that any new hires won't be able to apply their previous knowledge with "whatever" CMS and will have to learn how your previous/lead deveoper does things. They might end up deciding they'd rather pour hot lead onto their eyeballs than work on the insane thing this dev came up with. So there's that.
It's really a tough decision, but I think flat files win for security and speed, but maybe they lose for all of the other reasons.
|A downside to roll-you-own code is, especially if you're paying someone like an employee to do it for you, is that any new hires won't be able to apply their previous knowledge with "whatever" CMS and will have to learn how your previous/lead developer does things. |
Unless you are supporting the same plugin or modification on each site, I doubt if familiarity makes much difference. Like when we hire coders to upgrade or revise a WordPress plugin, it is usually done by someone different each time and in each case I am sure that they are merely dealing with the task at hand, ie: look through the code to see how that function works and then devise some code to modify it. The coders that are most efficient are simply that... most efficient and know their stuff.
Even with a CMS that I spent almost 12 months full time developing, I have absolutely no idea where to look to modify anything... always have to start afresh and see what is there. I have since used that CMS on many different client sites and customised it for each one. But nothing has changed... still no familiarity.
Now if I could just remember one of those URLs I might be able to give an example.
Coolest thing about text based flat files? You can load 'em up in your favorite text editor for quick edits/updates/fixes. You can reroll fields, expand/shrink fields, strip data, search & replace, count occurrences, and even sort the data in ways you would only dream of in a sql'like db.
|The question is, what is the advantage of using flat files? What is the trade-off for the extra work and risk? |
In some cases there's no trade-off, extra work or risk, in fact there's often less of all of the above.
If you only need to make a file that stores setup date for instance, putting it in a database takes longer to build the code than simply writing an .ini file in a text editor, saving it to the disk, and loading it and putting it into a multi-dimensional hashed array with a few lines of code.
Also, the OS provides file caching, and boatloads of it when available. For instance when I'm searching for stuff in my many megabytes of bot blocker log files grep takes a bit on the first pass as it's being cached, just like any database would do, but subsequent grep's immediately following are screaming fast for full text searches.
Basically, I don't like people spreading lies that I hear and read all the time about flat files that are either bold faced lies or flat our ignorance from people that don't know how everything works.
The real problem is some apache servers and web account permissions are set up wacky and you have set the subfolder rights to "777" just so your script can even write a damn file. Lot's of servers are set up where PHP is run as Apache instead of the account holder, which is a crock as I wrote a binary ecommerce package in C++ that could get the proper access information from the account and then write files as the account holder so why PHP does this stupid crap is unfathomable. The result is you end up with a bunch of files owned by "apache:apache" which you can't delete, upload over, or directly edit except by a PHP script using the "apache:apache" rights. This means using FTP or TELNET/SSH can't alter any of the flat files created unless they have ROOT access.
If FastCGI is available and used then the PHP code runs using the correct access rights, not "apache:apache" which creates other issues but at least everything is owned by the account holder and can be deleted, replaced (uploaded) or edited by normal means, such as FTP.
Because of all this crap, many people don't use flat files because the Apache/PHP configuration creates a ton of problems just to do simple things like writing and editing the files, so frustrated people use MySQL which doesn't have that particular set of problems and suffer through creating databases for every little thing needlessly.
The big problem is most script programmers don't know what in the hell they're doing and make the server vulnerable to SQL INJECTION attacks because they don't sanitize inputs or use prepared statements.
Still, the part that irks me is all the lies about flat file speed and all the other misinformation being spread when at the core everything uses them, EVERYTHING. Your binary files, your PHP scripts, the operating system and all the OS commands themselves, all flat files.
Someone I worked with said something similar that make me almost fall off my chair about compiled code being slower than scripts at which point I started yelling that the scripts were RUN by compiled code!
Where do people come up with this garbage?
I just tell them it's all magic.
[edited by: incrediBILL at 6:43 am (utc) on Aug 16, 2014]
|brotherhood of LAN|
>the OS provides file caching
MySQL's MyISAM tables rely on the file cache too. InnoDB takes care of its own caching though.
As iBill points out, I think the language you're using to communicate with your datastore can affect what'll be the best answer.
For fixed length stuff, flatfiles all the way, for me. I usually put a UNIX socket in between the web stuff and datastores.
|If you only need to make a file that stores setup date for instance, putting it in a database takes longer to build the code than simply writing an .ini file in a text editor, saving it to the disk, and loading it and putting it into a multi-dimensional hashed array with a few lines of code. |
This is a straw man. No one is advocating putting configuration files in the database. You can put everything in a database (even your code!), that would only be the right approach in the right circumstances: in general the configuration for database driven sites in in config files.
You have also rebutted a number of other arguments that no one put forward.
What you have not done is put forward any reason to use flat files.
|Where do people come up with this garbage? |
NO idea: I have never heard any of it before.
|Because of all this crap, many people don't use flat files because the Apache/PHP configuration creates a ton of problems just to do simple things like writing and editing the files |
ON the other hand many of us use RDBMSs for entirely different reasons. Every site I have worked on on the last five years, and all my own sites (except my blog) either run on hosting that lets me configure my own web server instance (sometimes in its own VM or cloud hosting, sometimes on the better shared hosting services).
|Coolest thing about text based flat files? You can load 'em up in your favorite text editor for quick edits/updates/fixes. |
There are plenty of front ends for editing data in an database AND a properly designed database will reduce the change of human error when editing (e.g. because foreign key constraints, uniqueness, data types etc. are enforced.
|You can reroll fields, expand/shrink fields, strip data, search & replace, count occurrences |
All of which you can do in a database.
|sort the data in ways you would only dream of in a sql'like db. |
An example please?
|Many sites don't need all of the capability a modern RDBMS provides. Depending on how you implement things, flat files can be insanely fast compared to a RDBMS. We're talking about being able to host a popular site on one or two servers as opposed to sixteen. |
Evidence? I think the condition you need for this to be true is that the site is very simple. I do not think many sites need ALL the capabilities of modern RDBMS, but most require enough of those capabilities to make an RDBMS the best choice. If you have performance problems, cache. In other cases a NoSQL database maybe a good choice.
Many large sites use more than one database: Facebook uses a several databases, including MySQL and Cassandra. Even on small sites I store text in the database but media in flat files (the database contains file paths).
[edited by: incrediBILL at 6:44 am (utc) on Aug 16, 2014]
[edit reason] fixed auto-correct typo [/edit]
To summarize my unanswered questions: what would I gain if I started developing using flat files instead of PostgreSQL?
|To summarize my unanswered questions: what would I gain if I started developing using flat files instead of PostgreSQL? |
Never used PostgreSQL, so I'm not qualified to answer that specific question. We use MySQL and MongoDB, I've used other stuff but not PostgreSQL.
Besides, It's not a simple situation where you can go one way vs. the other and I gave you a straight up example on the .INI file vs. a database above.
For a quick one-shot set of program preferences I could spend 30m to an hour or more building a database w/queries or about 5-10 minutes doing it flat file and edit using a text editor.
Like I said, it all depends on the application, data size, access requirements, whether it's a single installation or distributed software, etc.
but the first thing you gain is the inability for a hacker to find a vulernability in the SQL server or the code accessing it.
It's all on a case by case basis.
Simple as that.
|What you have not done is put forward any reason to use flat files. |
I beg to differ, but it was dismissively rejected a straw man when in fact it's an accurate reason to use flat files for the purpose specified.
I even included the fact that WebmasterWorld itself uses flat files and uses the OS as an indexing system and it runs faster than hell with millions of records in over a half-million files which could easily tank a SQL server if the database was configured improperly.
It's a per project decisions, nothing more, nothing less.
I also explained why many default to using LAMP just because writing flat files is problematic on many Apache installations.
[edited by: incrediBILL at 12:56 am (utc) on Aug 18, 2014]
>There are plenty of front ends for editing data in an database AND a properly designed database will reduce
Show me a decent one for mysql under unix and one under windows. (phpmyadmin and the php ilk are crash prone db corrupting jokes). Which is ultimately the best reason for using flat files - no third party tools needed.
OK, here's a real world example between flat files and a database that I recently coded on this very website:
Took me several days to implement the hot topics code using flat files and building new indexes:
Not only did I have to write the code to display the hot topics, but I had to create a cron jobs to build the daily index files that are used to generate those pages, and a cron job running all day long to update the "daily" hot topics.
Had the WebmasterWorld software been completely implemented in a SQL server, for both the threads and the log files like WordPress, it would've required taking a few hours tops. A couple of query statements, a little HTML and CSS work, a little code, very little by comparison, to display it all. Done.
Based on comparable SQL code I've written, the hot topics would've probably taken about 25% of the time required to implement using flat files, maybe less, and about 25% of the amount of code.
Even more critically, most people simply do not possess the programming chops required to execute that kind of code from scratch which is why they use SQL servers in the first place.
While flat vs db is definitely a design choice, it's also a skills choice as many modern programmers are clueless to the methods and techniques employed to implement such things as they've only been taught SQL with the exception or reading and writing sequential files, that's a problem.
|What you have not done is put forward any reason to use flat files. |
I'm pretty sure I said speed and ease of implementation, esp. for things such as a config file, takes about 5 min. vs and hour+, which you called a "straw man" when faced with a straightforward reason.
Above I gave the downside the large scale flat files usages.
Another good reason is the flat files on WebmasterWorld live with VERY LIMITED MEMORY USAGE vs the SQL servers that can, and do, choke and kill a server with a data set of this size. I've seen a SQL server upchuck and die trying to add a new field vs. flat files where you just start writing an extra field, no fuss no muss.
It's all trade-offs, which you're more comfortable with, and which best suits the job.
In this specific case, I believe WebmasterWorld smokes most comparable sites using SQL for storing threads and the amount of system requirements it takes to run are less.
Also, my bot blocker runs 100% flat files so a DDOS attack on the site won't crash the server by overloading the SQL, which I've seen happen to other bot blockers that use SQL. Then the whole site is down when the bot blocker databases crash, which mine do not do.
Likewise WebmasterWorld also doesn't crash in the traditional sense of a database as there is no rollback required. If a crash occurs, only the single file that crashed is lost, if any. Worse case we just reindex the whole site and all is 100% back, in a couple of minutes tops.
With a database, fixing a crash might involve rollbacks, database restored from backups, all sorts of nonsense.
|I'm pretty sure I said speed and ease of implementation, esp. for things such as a config file, takes about 5 min. vs and hour+, which you called a "straw man" when faced with a straightforward reason. |
An example of something that hardly anyone uses a database for, as an example of something you should not use a database for is a straw man. No one in the discussion advocated putting config in a database, we are advocating putting text site content, user info, media metadata etc. in a database. I have not heard a convincing argument against it.
The only real world example you have given was, by your own admission, one in which the use of flat files was a lot more work than using SQL.
|I even included the fact that WebmasterWorld itself uses flat files and uses the OS as an indexing system and it runs faster than hell with millions of records in over a half-million files which could easily tank a SQL server if the database was configured improperly. |
It would have to be very badly configured or designed, or under extremely high load to have a problem with a few million records. How much traffic does WW get?
I have a client's development database running perfectly well on either the free of cheapest Amazon AWS RDS package that has just gone past 10 million records. OK, many of the records are pretty small (including tags, short snippets of text, etc.)
|Even more critically, most people simply do not possess the programming chops required to execute that kind of code from scratch which is why they use SQL servers in the first place. |
Great, we are moving from straw man arguments to ad hominem ones. Most people use flat files because they do not understand SQL well enough to use it efficiently.
It also sounds like the "its better to do it the hard way because we are clever", a common developer mindset. I diagnose that with possible associated NIH.
|In this specific case, I believe WebmasterWorld smokes most comparable sites using SQL for storing threads and the amount of system requirements it takes to run are less. |
Memory is cheap. Developer time is not.
I believe it is not likely to be any better than a site using SQL with decent database design, appropriate indexes and proper caching.
If a crash occurs, only the single file that crashed is lost, if any. Worse case we just reindex the whole site and all is 100% back, in a couple of minutes tops.
With a database, fixing a crash might involve rollbacks, database restored from backups, all sorts of nonsense.
Rollbacks happen to ensure consistency. If you do not have an equivalent then your data is not safe. Have you measured the time roll backs take against the time your re-indexing takes?
A database should not need a restore from backup unless there is physical damage unless the file system is corrupted or the disk is physically damaged - in which case flat files will need a restore from backup as well. That is what the "D" in "ACID" is about.
@Brett, pdAdmin works well: certainly better for editing structured data than flat files. If MySQL does not have an eually good tool, that is a reason for not using MySQL, not for not using RDBMSs in general.
In practice, most site I develop have an admin that allows anything that is likely to need editing to be editing in the admin, so I rarely use pgAdmin (I do use it for looking at data, for which it is far superior to a text editor as it its easier to query for what I want to look at).
Try moving data from flat files into a properly designed database: you will probably find a lot of duplicates, inconsistencies and bad data.
|Try moving data from flat files into a properly designed database: you will probably find a lot of duplicates, inconsistencies and bad data. |
Where did that FUD come from?
Not understanding the technology is no reason to attempt to make it sound like it's highly unstable.
CVS files from Excel are flat files, so you're saying Excel files are bad data?
WORD writes documents as flat files, are all Word files inconsistent duplicates?
WordPress caching plugin simply writes every page as a flat file because, get this, it's faster to read when requested. Are you saying those cache files are bad data?
Every file the browser saves as cache and cookies are also a flat file so I suppose the browser is a completely unreliable flawed piece of technology?
BOTTOM LINE - flat files were used FOREVER before actual databases were available on early PCs and if they didn't work and weren't reliable then personal computers wouldn't be here today as nobody would've bothered using them!
Let's keep the discussion serious and professional and not make wild claims that can't be supported please, especially when all evidence is to the contrary.
|Where did that FUD come from? |
|CVS files from Excel are flat files, so you're saying Excel files are bad data? |
When people use Excel as a database they are, when they use it as a spread sheet, no.
|Let's keep the discussion serious and professional and not make wild claims that can't be supported please |
If you want a professional discussion rebut what people say, not a parody of what they said. The thread is about the appropriate data store for web site content, not config files, or MS Word.
|WordPress caching plugin simply writes every page as a flat file because, get this, it's faster to read when requested. Are you saying those cache files are bad data? |
No, I advocated caching in me last comment. It proves my point: you can get the speed for flat files with the safety of a database, even with as inefficient a design as Wordpress.
|BOTTOM LINE - flat files were used FOREVER before actual databases were available on early PCs and if they didn't work and weren't reliable then personal computers wouldn't be here today as nobody would've bothered using them! |
Running CMS based websites was not a typical use of early PCs!
If you really think flat files are so superior, would you be happy if you bank stopped using an RDBMS and stored all their data in flat files?
BOTTOM LINE - no one advocating flat files has managed to come up with an answer to a the key question: What would I gain if I stopped using PostgreSQL and used flat files instead?
If you want to discuss a website we all know, tell me what Disqus would gain if they dropped databases and stored everything in flat files? Or take your pick of well known websites or CMSs.
About WW... I've been noticing on and off for the past year or so that a new post will take sometimes up to 2 minutes to process. Other times it processes quickly. I haven't paid enough attention to see a pattern but it does happen often enough to rule out networking issues. Happens on a variety of devices and browsers, definitely server side. Of course I don't know if it has something to do with the data storage scheme or if it's just a bug.
As far as flat files versus databases, I think everyone can agree on one fact: A flat file system cannot compete with a database unless you've also coded in all the reasons why databases exist: indexes, memory caching (the very good reason that MySQL et al use a lot of memory), atomization, etc..
In which case you've essentially written your own database system. Seems like a lot of reinventing the wheel to me, but I wouldn't call it wrong.
@OP there is nothing inherently more or less secure in using databases or flat files, except maybe in a shared hosting environment where your flat files have to be stored in your user directory in order to be accessed whereas database files are stored in slightly harder to reach areas of the file system.
Thanks, some posts are WAY off the road. There are many valuable things to quote but I won't because they refer to questions nobody asked and I see myself wondering "why would anyone do that".
|Graeme: If you really think flat files are so superior, would you be happy if you bank stopped using an RDBMS and stored all their data in flat files? |
Honestly I can barely recognize the content of your post here. Have very little to say about it.
@Physics: you covered a lot of the points I see.
Thanks Incredibill I agree with your opinion, specially being a design choice and that some people don't know what to do with flat files or how to work with them. I'm no expert but I've been seeing an advantage on my "style" of work vs the new standard of mysql. I believe you nailed it on... well people are working with files! so while I "understand" some of what people say, the same things turn against them, the intention is evident.
The intention of this thread is not that, it's security. The main difference about flat file databases is one has to build the system and features, but let's focus on security. Rollbacks, I have that covered already.
The main thing (to me) is I have designed and developed big sites with lots of traffic working just fine with no security issues. Happy? yes. But I'm facing a new challenge that stresses me and I need to put my work to the test in terms of security and data load. So I had my CMS dealing with 14K daily uniques... so what? from my perspective it's something but the new webapp will deal with more so I'm wondering... and wondering.
Brett I couldn't agree more with you.
I guess in some way a lot of webmasters go 100% sure with eyes closed about security because they use frameworks or SQL stuff, while others no matter what we use we will always wonder about security be it frameworks or hand made code. Guess who has less trouble? the ones who puts more effort on challenging the code and files.
I also guess... at the end the thread should be more focused on sanitizing data input and file permissions.
@explorador, that was meant to be a parody of IncrediBill's argument I was facing: of course IcrediBill is not proposing that banks should use flat files - but neither was anyone at all proposing the MS Word should store all data in a database, or that config . It is not a question of which you should store ALL data in, but whether or not a CMS should use an RDBMs.
I have not really heard a good reason to prefer flat files. With either you would be more secure in an environment you control: a dedicated server is better than a VM, is better than shared hosting with your own server instances, which is better than shared hosting with shared server instances.
Not to me!
|I guess in some way a lot of webmasters go 100% sure with eyes closed about security because they use frameworks or SQL stuff |
Frameworks do make security a lot easier and more reliable. The potential pitfalls are usually well documented and are usually obviously risky if not done right (e.g. adding raw SQL to an ORM generated query).
|I also guess... at the end the thread should be more focused on sanitizing data input and file permissions. |
What you can do with file permission depends on how much control you have. If you have a dedicated server or a VM only your code runs on the web server, and the web server user can only read or write the files it needs to.
As for sanitizing input, use a library or an ORM, preferably one that is widely used and well tested. I was initially resistant to the idea of ORMs (because they layer a high level abstraction on top of SQL which is already a high level abstraction), but one of the things that changed my mind was the security advantages of SQL generated by well tested code. There have been some holes, but they have been few and rapidly fixed with mitigation advice available very fast: there are a lot of advantages to many eyes looking at the code and testing it.
@IanKelley, I think reinventing the wheel is usually a bad thing: the code is never going to be as well tested or have the same features as something that is widely used. The exceptions I can think of are:
1) You have very specialised needs
2) You have huge resources and can devote some to maintaining an extra product
3) It is appealing enough that other people will use it (and either help fund it, or buy licences). NoSQL databases are probably good examples of this.
[edited by: graeme_p at 12:20 pm (utc) on Aug 20, 2014]
@explorador, to secure a database, a few things you can do come to mind:
* It should be firewalled not to accept requests except from your web app or localhost. To manage/edit it manually you use ssh (an ssh tunnel if you want to use a GUI to manage it).
* The web server should not be able to read the files containing the database login. Also place them outside the server document root.
* Everything I said earlier about avoiding SQL injection.
| This 41 message thread spans 2 pages: 41 (  2 ) > > |