Database Management/Normalization

This lesson introduces database normalization concepts.

Objectives and Skills edit

Objectives and skills for this lesson include:

  • Understand the purpose of database normalization.
  • Apply database normalization concepts to a given dataset.

Readings edit

  1. Wikipedia: Database normalization
  2. Wikibooks: Database Design/Normalization

Multimedia edit

  1. YouTube: Database Normalization
  2. YouTube: Normalization - 1NF, 2NF, 3NF and 4NF
  3. YouTube: Data Modeling and the ER Model

Activities edit

Design a normalized (3NF) database representing one or more of the following datasets. Each table must have multiple fields and no repeating groups, with each field dependent upon the entire primary key. This is a logical design only (do not use a DBMS). The physical design will be added in a future lesson. Use the following syntax to document your completed design. Use PascalCase for table and field names. Table names should be plural.
Table1(PrimaryKey, Field2, Field3, ...)
Table2(PrimaryKey, Field2, Field3, ...)
...

  1. A car dealership wants a database to track sales by customer, vehicle, and sales person.
  2. A college wants a database to track enrollment by student, course, and instructor.
  3. A company wants a database to track their organizational structure by department, employee, and location.
  4. A computer repair company wants a database to track repairs by customer, computer, and technician.
  5. A doctor's office wants a database to track patient visits by doctor, patient, and visit.
  6. A library wants a database to track books by title, author, and genre.
  7. A YouTube subscriber wants a database to track videos by subject, YouTuber, and playlist.
  8. Choose your own dataset similar to the above with at least three related entities.

Lesson Summary edit

  • Database normalization is a logical database design technique to structure a relational database in accordance with a series of normal forms in order to reduce data redundancy and improve data integrity.[1]
  • Normalization is a process of using normal forms to avoid logical design error occurrences within a database. Unnormalized database design will display database inconsistencies which will cause errors when inserting, deleting or updating record information.[2]
  • De-normalization is a process that changes relations from higher to lower normal forms and is done to improve the performance of retrieving relations from the database. It also reduces the cost to maintain a database because it requires less joins.[3]
  • Database normalization is a progressive process using different normal forms. A higher level of database normalization cannot be achieved unless the previous levels have been satisfied.[1] Any relation that is in BCNF satisfies 3NF. Any relation in 3NF satisfies 2NF. Any relation in 2NF satisfies 1NF.[3]
  • A fully normalized database allows its structure to be extended to accommodate new types of data with slight changes to the existing structure.[1]
  • The basic normal forms (from least normalized to most normalized) include: UNF (unnormalized form); 1NF (first normal form), 2NF (second normal form), 3NF (third normal form).[1]

Key Terms edit

alternate key
A candidate key not chosen as the primary key.[4]
anomaly
A deviation from a rule or from what is regarded as normal.[5]
Boyce-Codd normal form (BCNF)
A slightly stronger version of third normal form (3NF).[6]
candidate key
A column(s) in a table that can uniquely identify any database record without referring to any other data.[7]
decomposition
The process of breaking down a relation into smaller relations in an attempt to normalize the database.[8]
first normal form (1NF)
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).[9]
functional dependency
A relationship between two attributes, typically between the PK and other non-key attributes within a table.[10]
normal form
Used to eliminate or reduce redundancy in database tables.[11]
primary key
A column that uniquely identifies a particular row in a table.[12]
second normal form (2NF)
Meets all the requirements of first normal form and removes subsets of data that apply to multiple rows of a table and place them in separate tables.[9]
third normal form (3NF)
Meets all the requirements of second normal form and removes columns that are not dependent upon the primary key.[9]
unnormalized relation
Contains data redundancy, where multiple values and/or complex data structures may be stored within a single field or attribute, or where fields may be replicated within a single table.[13]

See Also edit

References edit