Normalization - A Brief Introduction - Databases forum at WebmasterWorld

In the movie Young Frankenstein Igor is charged with getting a brain for the monster. He drops the brain he was after and instead brings back another one. After the wrong brain was "installed" and it was discovered, the Doctor asked him what brain he brought back, to which Igor replied, "Abby someone. Yes, Abby Normal."

Normal and Abnormal are also terms used to describe the creation of a relational database. A database is said to be normal when all traces of redundant data are eliminated.

Putting the R back in RDMS

The R stands for Relational, as in relative and relates. How many times are we introduced as "Bill's Son" or "Sally's Husband" or "Arnold's Grandson?" Those references to YOU are using a relational model.

Your name might be John, but at school your dad is well known - you are Bill's son. Someone made a relationship between you and your dad.

In a database you might have a table of names:

John Smith
Fred Jones
Carla Bloop

But you also want to store information about those people, their relatives, their addresses hair color, etc. Some data will be unique to that name, but some might not be. Let's take a basic example - names and phone numbers.

NameTable
name, phoneno, type
John Smith, 212-555-1212, home
John Smith, 212-555-1234, cell
John Smith, 212-555-9898, office
Fred Jones, 718-555-1212, home
Fred Jones, 718-222-2222, fax
Fred Jones, 718-111-1111, cell
Fred Jones, 718-333-3333, pager

In that simple example above, you can see that there is redundant data - the name of the person. It would be better to split this into two tables using some type of relation between the tables to identify the data. I'm going to use an ID number for each name so when I have to find the match later, I can do it by numbers rather than letters. Databases prefer numbers.

Names
id, name
101, John Smith
102, Fred Jones

Phones
id, number, type
101, 212-555-1212, home
101, 212-555-1234, cell
101, 212-555-9898, office
102, 718-555-1212, home
102, 718-222-2222, fax
102, 718-111-1111, cell
102, 718-333-3333, pager

Better, but we still have a little redundant data here - the phone type. So, we could add a third table with the type codes:

Phones
id, number, type
101, 212-555-1212, h
101, 212-555-1234, c
101, 212-555-9898, o
102, 718-555-1212, h
102, 718-222-2222, f
102, 718-111-1111, c
102, 718-333-3333, p

Types
typecode, type
h, home
c, cell
o, office
f, fax

Anything else that can be simplified? Or normalized? Perhaps. But I'd be happy with this level of normalization. I can add new types easily. When I add a new person I'm not creating empty space in the first table for the phones and types.

Normalization goes way beyond this simple example. Database Administrators get into fist fights over the level of normalization required. But it is valuable when planning any database project. Planning to be normal from the start will save lots of grief in the future (I speak from personal experience....)

Normalization - A Brief Introduction

databases, unlike web geeks, need to be normal

txbakers

deejay

txbakers

Lord Majestic

aspdaddy

RonPK

dr_john

txbakers

aspdaddy

Easy_Coder

MovingOnUp

jamie

chronic

RossWal

aleksl

physics

Easy_Coder

dmmh

Easy_Coder

Hanu

Easy_Coder

aleksl

dmmh

mrMister

Hanu

Hanu

victor

jamesa

SuzyUK

aspdaddy

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week