Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: open
Normal and Abnormal are also terms used to describe the creation of a relational database. A database is said to be normal when all traces of redundant data are eliminated.
Putting the R back in RDMS
The R stands for Relational, as in relative and relates. How many times are we introduced as "Bill's Son" or "Sally's Husband" or "Arnold's Grandson?" Those references to YOU are using a relational model.
Your name might be John, but at school your dad is well known - you are Bill's son. Someone made a relationship between you and your dad.
In a database you might have a table of names:
But you also want to store information about those people, their relatives, their addresses hair color, etc. Some data will be unique to that name, but some might not be. Let's take a basic example - names and phone numbers.
name, phoneno, type
John Smith, 212-555-1212, home
John Smith, 212-555-1234, cell
John Smith, 212-555-9898, office
Fred Jones, 718-555-1212, home
Fred Jones, 718-222-2222, fax
Fred Jones, 718-111-1111, cell
Fred Jones, 718-333-3333, pager
In that simple example above, you can see that there is redundant data - the name of the person. It would be better to split this into two tables using some type of relation between the tables to identify the data. I'm going to use an ID number for each name so when I have to find the match later, I can do it by numbers rather than letters. Databases prefer numbers.
101, John Smith
102, Fred Jones
id, number, type
101, 212-555-1212, home
101, 212-555-1234, cell
101, 212-555-9898, office
102, 718-555-1212, home
102, 718-222-2222, fax
102, 718-111-1111, cell
102, 718-333-3333, pager
Better, but we still have a little redundant data here - the phone type. So, we could add a third table with the type codes:
id, number, type
101, 212-555-1212, h
101, 212-555-1234, c
101, 212-555-9898, o
102, 718-555-1212, h
102, 718-222-2222, f
102, 718-111-1111, c
102, 718-333-3333, p
Anything else that can be simplified? Or normalized? Perhaps. But I'd be happy with this level of normalization. I can add new types easily. When I add a new person I'm not creating empty space in the first table for the phones and types.
Normalization goes way beyond this simple example. Database Administrators get into fist fights over the level of normalization required. But it is valuable when planning any database project. Planning to be normal from the start will save lots of grief in the future (I speak from personal experience....)
It's not always the right the thing to do but here's a great example, real site:
Normalized DB, indexed properly, lots of records. When the site got traffic it choked. The problem revolved around three types of queries: first would just run forever (had to kill them manually), the second type would take a full 5 minutes to run, the third about a full minute. Site was effectively dead.
After restructuring the db, ALL the queries completed within milliseconds. Might have been an extreme case, but regardless that's a pretty dramatic difference.
This was MySQL 3x, and the WHERE clauses were searching across multiple tables (i.e.- WHERE table1.field >='5' AND table2.field ='x' AND table3.field ='y', etc...
This just shows that denormalization isn't always necessary. There are other options: optimizing queries, helping the optimzers or even buying a RAID controller (RAM won't help because the bottleneck is IO, not memory or CPU) and fast SCSI discs are preferrable in my opinion as long as I can keep my code and data clean.
A query to display one invoice to a customer selecting from 3 yrs data (100K sales PA, 1.5 million rows in invoice-line table after 3 years) optimised by a SQL2K compiled View:
Create View Sales2005Q1
SELECT * FROM dbo.Invoices
SELECT * FROM Sales2005Q1
Scan time is reduced to 1/12th
I think your query is slow because for every row, the procedure dbo.SalesQtr() is invoked. You might be able to speed it up without having to use a view by doing date comparisons
SELECT * FROM dbo.Invoices
WHERE dbo.SalesQtrStartDate('2005Q1') <= dbo.Invoices.InvoiceDate AND dbo.Invoices.InvoiceDate < dbo.SalesQtrStartDate('2005Q2')
You only need to write the SalesQtrStartDate() procedure which returns the date of the first day in the given quarter.
It would be interesting to hear more thoughts about various strategies that can be used to overcome the performance problems in situations where it is desirable to maintain a high degree of normalization in the underlying DB.
We're still struggling with this problem. We've been experimenting with two techniques:
1. exporting a special denormalized DB to the web server which is used to run the actual website (we call it a data cube)
2. using a content management system's "static export" function to generate flat html pages.
One or the other -- or a combination of both -- of these techniques seems imperative (or perhaps there is a better strategy we haven't thought of?).
We are working with a large data base than really needs a high degree of normalizion for efficiency and maintenance reasons, and yet the web pages we want to publish were impossibly slow when we tried to publish them directly from the DB. They required too many joins, and/or stored procedures, and this couldn't be handled by the MYSQL software we are using.