Normalization and the process of Normalizatio
The Normalization
Process
Normalization
is a data analysis technique to design a database system. It allows the
database designer to understand the current data structures within an organization.
Furthermore, it aids any future changes and enhancements to the system.
The end result of normalization is a set of
entities, which removes unnecessary redundancy (i.e. duplication of data) and
avoids the anomalies discussed earlier.
Normalization follows a staged process that
obeys a set of rules.
The steps of normalization are:
- Select the data source and convert into an unnormalised table (UNF)
- Transform the unnormalised data into first normal form (1NF)
- Transform data in first normal form (1NF) into second normal form
(2NF)
- Transform data in second normal form (2NF) into third normal form
(3NF)
Occasionally, the data may still be subject
to anomalies in third normal form. In this case, we may have to perform further
transformations.
- Transform third normal form to Boyce-Codd normal form (BCNF)
- Transform Boyce-Codd normal form to fourth normal form (4NF)
- Transform fourth normal form to fifth normal form (5NF)
For this unit you only need to be aware of
the above three forms. However, the process to transform them is not assessed
and, therefore, not described.
Finally, from the normalized tables, a data
model is produced.
Earlier you saw an example of a data model
containing several entities. The normalization process helps us determine the
entities required in the system being modeled. These entities can then be
represented in a data model. Later, following the normalization process to
determine the entities, we will see how the data model itself is constructed.
First, the process of normalization (to 3NF)
is demonstrated by use of an example.
Normalization is a bottom-up technique for
database design, normally based on an existing system (which may be
paper-based). We start by analyzing the documentation, e.g. reports, screen
layouts, etc from that system.
Normalization
Example
Below
is an extract from a company report, which describes projects being worked upon
by employees . This report is to be 'normalized'. Each of the first four normalization
steps will be explained shortly.
The first step is to select the data source (i.e.
the report above) and convert into an unnormalised table (UNF). The process is
as follows:
- Create column headings for the table for each data item on the report
(ignoring any calculated fields). A calculated field is one that
can be derived from other information on the form. In this case total
staff and average hourly rate.
- Enter sample data into table (this data is not simply the data on the
report but a representative sample. In this example it shows several
employees working on several projects. The same employee can work on
different projects and at a different hourly rate).
- Identify a key for table (and underline it).
- Remove duplicate data (in this example, for the chosen key of Project
Code, the values for Project Code, Project Title, Project Manager and
Project Budget are duplicated if there are two or more employees working
on the same project).
Unnormalised Form (UNF)
Project Code chosen for the key and duplicate data, associated with each
project code, is removed. Do not confuse duplicate data with repeating
attributes which is described in the next step.
First Normal Form (1NF)
The next step is to
transform the table of unnormalised data into first normal form (1NF). The rule
is: remove any repeating
attributes to a new table. The process is as follows:
- Identify repeating attributes.
- Remove these repeating
attributes to a new table together with a copy of the key from the UNF table.
- Assign a key to the new table (and underline it). The key from the original unnormalised table always becomes part of the key of the new table. A compound key is created. The value for this key must be unique for each entity occurrence.
1NF with repeating
attributes removed
After removing the duplicate data the repeating attributes are
easily identified.
In the previous table the Employee No, Employee Name, Department
No, Department Name and Hourly Rate attributes are repeating. That is, there is
potential for more than one occurrence of these attributes for each project code.
These are the repeating attributes and have been to a new table together with a
copy of the original key (i.e. Project Code).
A key of Project Code and Employee No has been defined for this
new table. This combination is unique for each row in the table.
Second Normal Form (2NF)
The next step is to transform the data in first normal form (1NF)
into second normal form (2NF). The rule is: remove
any non-key attributes that only depend on part of the table key to a new
table. Ignore tables with (a)
a simple key or (b) with no non-key attributes (these go straight to 2NF with
no conversion). The process is as follows:
- Take each non-key attribute in
turn and ask the question: is this attribute dependent on one part of the key?
- If yes, remove attribute to
new table with a copy of the part of the key it is dependent upon.
The key it is dependent upon becomes the key in the new table. Underline
the key in this new table.
- If no, check against other
part of the key and repeat above process
- If still no, i.e. not dependent
on either part of key, keep attribute in current table.

The first table went straight to 2NF as it has a simple key
(Project Code).
Employee name, Department No and Department Name are dependent
upon Employee No only. Therefore, they were moved to a new table with Employee
No being the key.
However, Hourly Rate is dependent upon both Project Code and
Employee No as an employee may have a different hourly rate depending upon
which project they are working on. Therefore it remained in the original table.
3rd Normal Form (3NF)
The next step is to transform the data in second normal form (2NF)
into third normal form (3NF). The rule is: remove
to a new table any non-key attributes that are more dependent on other non-key
attributes than the table key. Ignore tables with zero or only one non-key
attribute (these go straight to 3NF with no conversion). The process is as
follows:
- If a non-key attribute is more
dependent on another non-key attribute than the table key:
- Move the dependent attribute, together with a copy of the non-key attribute upon
which it is dependent, to a new table.
- Make the non-key attribute,
upon which it is dependent, the key in the new table. Underline the key
in this new table.
- Leave the non-key attribute, upon
which it is dependent, in the original table and mark it a foreign key (*).

The project team table went straight from 2NF to 3NF as it only
has one non-key attribute. Department Name is more dependent upon Department No
than Employee No and therefore was moved to a new table. Department No is the
key in this new table and a foreign key in the Employee table.
The Normalization
Process
We've
now been through the complete process. Having started off with an unnormalised
table we finished with four normalized tables in 3NF. You will notice that
duplication has been removed (apart from the keys needed to establish the links
between those tables).
The process may look complicated. However, if
you follow the rules completely, and do not miss out any
steps, then you should arrive at the correct solution. If you omit a rule
there is a high probability that you will end up with too few tables or
incorrect keys.
A summary of the complete normalization
process is given on the following pages.
Step
1: UNF
Process:
- Create column headings (ignoring any calculated fields)
- Enter sample data into table
- Identify a key for table (and underline it)
- Remove duplicate data
Step
2: 1NF
Rule: Remove any repeating attributes to a
new table
Process:
- Identify repeating attributes
- Remove repeating attributes to a new table together with a copy of
the key from the UNF table
- Assign a key to the new table (and underline it). The key from the unnormalised
table always becomes part of the key of
the new table. A compound key is created. The value for
this key must be unique for each entity occurrence.
Step
3: 2NF
Rule: Remove any non-key attributes that only
depend on part of the table key to a new table
Ignore tables with (a) a simple key or ( b)
with no non-key attributes (these go straight to 2NF with no conversion)
Process:
- Take each non-key attribute in turn and ask the question
- is this attribute dependent on one part of the key?
- If yes, remove attribute to new table with a copy of
the part of the key it is dependent upon. The key it is
dependent upon becomes the key in the new table. Underline the key in this
new table.
- If no, check against other part of the key and repeat above process.
- If still no, i.e. not dependent on either part of key, keep attribute
in current table.
Step
4: 3NF
Rule: Remove to a new table any non-key
attributes that are more dependent on other non-key attributes than the table
key
Ignore tables with zero or only one non-key
attribute (these go straight to 3NF with no conversion).
Process:
- If a non-key attribute is more dependent on another non-key attribute
than the table key
- Move the dependent attribute, together with a copy of
the non-key attribute upon which it is dependent, to a new table
- Make the non-key attribute, upon which it is dependent, the key in the
new table. Underline the key in this new table.
- Leave the non-key attribute, upon which it is
dependent, in the original table and mark it a foreign key (*).
Comments
Post a Comment