Normalization is a process that helps you arrange data into tables.
The goal of normalization is to reduce data redundancy and facilitate
efficient updates. In theory,
a perfectly normalized database uses data at maximum efficiency
and minimum redundancy. In practice, however, normalization does
not take into consideration the specific applications of a database.
Consequently, normalization is recommended only as a guideline for
database design.
Use normalization to get your data
into relational format (two-dimensional tables) and then alter
the table format to fit the specific needs of users and applications.
The process of normalization entails examining the data
and the table format of a database
against each of the criteria for each of the normal forms.
Although there are at least five normal forms,
most database tables need to be processed only to the second
or third normal form.
Therefore, only the first three normal forms are discussed here.
When data is normalized, each lower normal form is a prerequisite
to the next higher normal form. In other words, data must be in
first normal form before it can be put into second normal form.
First Normal Form |
 |
In first normal form, data contains rows that have
the same number of columns.
Unnormalized data is not so orderly.
For example, some purchase orders contain one order item, others
contain two or more.
Thus, if each order number from the purchase
order in Figure 2-1 “Identifying Data Categories” were to occupy a single row,
each row would have a different number of
columns.
By separating the data into three groups as in Figure 2-3 “First Normal Form: Removing Repeating Groups”,
repeated categories are removed,
leaving each row with the same number
of columns.
Second Normal Form |
 |
In second normal form, shown in Figure 2-4 “Second Normal Form: Establishing Dependency”,
all attributes are functionally
dependent on the primary key.
Attributes are functionally dependent
when their values depend directly upon the value of the primary key.
The primary key in the Order Item information is the order
and item number combination. Each such combination
represents a unique row.
Since the vendor part number and the order
quantity relate to a particular order and item number,
they are functionally dependent on the primary key.
However, the unit price depends on the order quantity
(which, in turn, depends on the order and item number), but it does
not depend directly on order or item number.
This is called a transitive dependency.
Third Normal Form |
 |
In third normal form, data does not have transitive
dependencies. In Figure 2-5 “Third Normal Form: Removing Transitive Dependencies”, unit price is removed
from the Order Items information and placed in a new set
of information called Supply Price, which is used to calculate
the price of an item dependent on the quantity ordered and date
delivered.
Third normal form relieves update problems.
For example, in second normal form, when an item's price changes,
all orders containing that item would
have to be altered. In third normal form, only
the rows for that item in the Supply Price table need to be changed.
Normalized data provides a starting point for table definition.
Note that the data in Figure 2-3 “First Normal Form: Removing Repeating Groups”, Figure 2-4 “Second Normal Form: Establishing Dependency”,
and Figure 2-5 “Third Normal Form: Removing Transitive Dependencies” is in tabular form.
The next step in table design is to take the normalized data and
fit it into ALLBASE/SQL columns, tables, views, and indexes.