Database Fundamentals/Normalization

This lesson introduces normalization.

First Normal Form (1NF)

edit

- Data is in 1NF when:

 1. Each column contains atomic (indivisible) values.
 2. Each row is unique, and no column has repeating groups.

Example: Before normalization (not in 1NF):

StudentID Name Subjects
1 Alice Math, Science
2 Bob History, Math

After normalization (in 1NF):

StudentID Name Subject
1 Alice Math
1 Alice Science
2 Bob History
2 Bob Math

Second Normal Form (2NF)

edit

- A table is in 2NF if:

 1. It is in 1NF.
 2. All non-key attributes are fully dependent on the primary key.

Example: Before normalization (not in 2NF):

OrderID ProductName Price CustomerName
1 Pen 1.00 Alice
2 Notebook 5.00 Bob

After normalization (in 2NF): Orders Table:

OrderID CustomerName
1 Alice
2 Bob

Products Table:

ProductID ProductName Price
1 Pen 1.00
2 Notebook 5.00

OrderDetails Table:

OrderID ProductID
1 1
2 2

Third Normal Form (3NF)

edit

- A table is in 3NF if:

 1. It is in 2NF.
 2. It contains no transitive dependencies (i.e., no non-key attribute depends on another non-key attribute).

Example: Before normalization (not in 3NF):

EmployeeID Name Department Manager
1 Alice Sales John
2 Bob IT Sarah

After normalization (in 3NF): Employees Table:

EmployeeID Name DepartmentID
1 Alice 101
2 Bob 102

Departments Table:

DepartmentID Department Manager
101 Sales John
102 IT Sarah

Objectives and Skills

edit

Objectives and skills for the normalization portion of Microsoft Exam 98-364 Database Fundamentals include:[1]

  • Understand data storage
    • Understand normalization
      • Understand the reasons for normalization, the five most common levels of normalization, how to normalize a database to third normal form

Readings

edit
  1. Wikipedia: Database normalization

Multimedia

edit
  1. Microsoft Virtual Academy: Relational Concepts
  2. YouTube: Normalization
  3. YouTube: Physical Database Design Methodology

Activities

edit
  1. While using the types of entities based on the College database (student, course, instructor), convert the entities to an equivalent collection of tables that are normalized and are in fourth normal form.
    • Using this information, convert the unnormalized relation to fourth normal form:
      • Create an example of a table that is in first normal form, but not in second normal form.
      • Create an example of a table that is in second normal form but not in third normal form.
      • Create an example of a table that is in third normal form but not in fourth normal form.

Lesson Summary

edit
  • Database normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy. Normalization involves decomposing a table into less redundant (and smaller) tables without losing information; defining foreign keys in the old table referencing the primary keys of the new ones. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys. [2]
  • First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible) values, and the value of each attribute contains only a single value from that domain. First normal form enforces these criteria: Eliminate repeating groups in individual tables. Create a separate table for each set of related data. Identify each set of related data with a primary key[3]
  • Second normal form (2NF) is a normal form used in database normalization. A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. A table is in 2NF if it is in 1NF and every non-prime attribute of the table is dependent on the whole of every candidate key.[4]
  • Third normal form is a normal form that is used in normalizing a database design to reduce the duplication of data and ensure referential integrity by ensuring that (1) the entity is in second normal form, and (2) all the attributes in a table are determined only by the candidate keys of that table and not by any non-prime attributes.[5]
  • Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X \twoheadrightarrow Y, X is a superkey—that is, X is either a candidate key or a superset thereof.[6]
  • Normal forms 5 and 6 are less commonly used. It is possible to design a database in 5NF from outset and adapt it into 6NF with slight tweaks.
  • Data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data.[7]

Key Terms

edit
fifth normal form (5NF)
first normal form (1NF)
form
fourth normal form (4NF)
normalization
normal forms (NF)
redundant data
second normal form (2NF)
third normal form (3NF)

See Also

edit

References

edit
  Type classification: this is a lesson resource.