Database normalization provides a set of easy-to-apply principles that can help the developer structuring its database and improve MySql performance.
Normalizing a database is to organize its data more efficiently thanks to normal rules or principles. For more efficiency, building tables and the relations between the tables will be the result of the normalization process. DB normalization ensures :
- an optimized structure that does not waste disk space (the same data should not be found in more than one table for obvious reasons : updating several tables for the same data is absurd)
- logical dependencies (implies storing relevant data inside or related to the right table)
Normal forms
I’ll stick here to the 3 first rules, guidelines that db-literates call Normal Forms. These rules should add enough efficiency to your database. Keep in mind you may sometimes prefer not to apply a rule for practical reasons. Indeed, normalization consists in adding relation tables, for example, that will require the developer to change his MySql queries.
Normal forms must be applied in the right order : further normalisation can only take place if the previous normalisation rule has already been applied.
For more convenience, I’ll take the example of a discography database that will store albums of famous artists.
Creating the database on the fly with no normalization experience could result in the creation of one single table :
records_table :
record_ID : 1 (defined as Primary_KEY)
artist : Fleetwood Mac is a British-American band that formed in 1967…
record_title : The pious bird of good omen
record_style : Blues, Rock, Pop
record_label : blue horizon
record_release : 1969
With this very simple table, one can easily understand that :
- a SELECT query cannot get information about the artist without a record_ID
- an UPDATE query that wants to modify the artist biography will have to do so in every row that is associated with that artist
- a DELETE query will delete the record information as well as the artist information
These limitations may produce anomalies such as data corruption, MySql resource problems (with LIKE subqueries),… So, it’s time to normalize that database and we will do it thanks to the relational theory.
Relational theory
1NF => 1st Normal Form : the atomic form
Each table field should have an atomic value, i.e. only one single value in a column of your table. In the above example, the fields ‘artist’ and ‘record_style’ contain more than one value. This is why applying 1NF will result in the creation of 2 extra tables :
records_table :
record_ID : 1 (defined as Primary_KEY)
record_title : The pious bird of good omen
record_release : 1969
and
artists_table :
artist_ID : 1 (defined as Primary Key)
artist_name : Fleetwood Mac
artist_bio : Fleetwood Mac is a British-American band that formed in 1967…
and
styles_table (with 3 rows) :
style_ID : 1 (defined as Primary Key)
style_name : blues
style_ID : 2
style_name : pop
style_ID : 3
style_name : rock
and
labels_table :
label_ID : 1 (defined as Primary Key)
label_name : Blue Horizon
You’ll easily understand it is now easy to add an artist without adding a record from that artist, you can delete a record without deleting artist information,…
Data is now separated in different tables. We still have to create relations between these tables. Database relations can be of the following types:
1 to 1 : one element of table A corresponds to one element of table B and that element from table B corresponds to only one element of table A
1 to N : a record from the records_table can have only one label but that label may have more than one records. In that case, we add a foreign key to the records table containing the ID of the label
N to N : a record from the records_table can have more than one artist and an artist from the artists_table can have more than one record. In that case, we implement a third table called records_artists_table, a relational table that will link the records_table with the artists_table
So, let’s add a foreign key to the records_table :
records_table : 1 to N relationship
record_ID : 1 (defined as Primary_KEY)
record_title : The pious bird of good omen
record_release : 1969
label_ID : 1
and create the 2 new tables :
records_artists_table : N to N relationship
record_ID : 1
artist_ID : 1
We could also create the same kind of relationship between the records and the styles of music :
records_styles_table : N to N relationship
record_ID : 1
style_ID : 1
2NF => 2nd Normal Form : the relationships to keys
This Normal Form ensures non-key columns strictly depend on the primary key. So, in case of a composite primary key (a row in the table is identified thanks to the record_ID and the author_ID), all non-key columns must depend on the 2 fields, and not one of them only. Imagine the following table structure :
records_reviews table :
record_ID : 1 (defined as primary key with the following field)
author_ID : 1 (defined as primary key with the previous field)
review : This compilation album is one of my favourites…
author_website : http://www.domain.com
In the above example, the field author_website doesn’t strictly depend on the composite key record_ID-author_ID. It only depends on the author_ID. Applying the second normal form will result in moving the author_website field to the author table.
3NF => 3rd Normal Form
labels_table :label_ID : 1 (primary key)label_name : Blue Horizonlabel_zip : 1770label_city : Sherbornlabel_state : MA
zip table :zip_ID : 1770zip_city : Sherbornzip_state : MA
labels_table :label_ID : 1 (primary key)label_name : Blue Horizonzip_ID : 1770
states_table :state_ID : 1 (primary key)state_name : MA
SELECT * FROM labels as A INNER JOIN zip as B ON B.zip_ID=A.zip_ID INNER JOIN states as C ON C.state_ID=B.state_ID WHERE A.label_ID=’1′ LIMIT 1;
MySql Introduction to db normalization