Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies


An Overview

6:35 pm on Sep 24, 2005 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
votes: 169

Most (all?) Database texts and courses say that a Relational DB should be normalised to at least third normal form and preferably fifth. They might hark back to the old days of slow CPUs and expensive memory noting these as historic reasons for de-normalisation that no longer apply. Hogwash..

* The only iron-clad Relational Database requirement is that relations be in at least first normal form.
* normalisation is a part of the analysis stage, not the design stage, because normalisation increases the complexity of the database and consequently the number of table joins required to provide requested information.
* there is a significant difference between the application stresses of applications with few or many users and few or many tables and small or large tables.

So, stop worrying about "theory" and actually analyse the information and update requirements. The information requirements will tell what data are needed together (eg. customer name, address, telephone number, etc.) and how often. The update requirements will tell which data are changed and how often.

Plot the resulting patterns showing which data are usually required together and also if they are updated together. These patterns indicate which entities would best be combined in a single table.

There are 3 main ways to de-normalise (but all are variations of minimising the number of joins in a query):
* One: Minimise/Eliminate Joins
Eg. two tables: Employee and Department. The Employee table typically would include a Department table id which on query requires a join between the two tables to translate that id into an actual name. Unless department names are often changed (in which case update time could eliminate query time savings) the redundent addition of department name to employee table will speed query time. And if all that is in Department is id and name the entire table can be eliminated.

* Two: Add Child Detail To A Parent Table
Eg. typically when required to store both past and present customers' addresses, all customer addresses would be "normalised out" from the customer table into a customer addresses table. If, as likely, previous addresses would be rarely queried adding the current address to the customer table will, while being redundent, likely speed queries.

* Three: Add Summary Tables
Eg. sales tracking by product, department, month, quarter, and year-to-date is required. Typically summary displays are generated by various queries summing the parameter transactions via multiple joins. The more people more frequently use more varied queries the slower the system will perform. Adding a "summary" table that lists each product with columns for each month plus quarter and year-to-date would provide much faster access.

All that said, there is a big disadvantage to de-normalisation: when the data changes in any one table, all instances in the many also must be updated. Do not de-normalise if it fails to increase performance or if the application would be too complex.

If de-normalising do so only with data that never/rarely changes, i.e. city name, and on high frequency queries that join multiple tables, i.e. customer list including shipping addresses.

The overhead of de-normalisation is the increased storage requirements (minor given todays storage costs) and maintaining data accuracy across records. This last is critical. It can be done via program code, batch process, or database trigger (best). Keeping de-normalised data valid adds time and potential coding errors which must be factored into the costing of de-normalisation.

"normalize until it hurts, denormalize until it works"
Jason Couchman, "Oracle, Certfied Professional, DBA Certification Exam Guide"