Categories
Sotfware & DevOps Tools & HowTo

Normalization of SQL Database: Remove Redundancy

Normalization of database is an essential skill to know as a database designer and developer. I still remember being intimidated by what it would bring on-- back when I took a SQL database course. It was scary hearing about it. Let me assure you: it is easier than you might imagine. What is important though, is that you understand the structure of a database quite well before beginning with this article. The result of a normalized database is an accessible retrieval of desired or suitable attributes by removing redundancy. However, take care as it can backfire if care is not taken. For this article, you will first be introduced to some key basic concepts that are fundamental to your understanding of normalization. One such concept is Functional Dependency.

DB normalization

Functional Dependency

Functional Dependency describes the relationship between different attributes and it is a property of schema-- intension.. It's a crucial concept to understand before proceeding with normalization. It has a one-to-one relationship between the attributes, and it holds for all instances.

"If A and B are attributes of relation R, B is functionally dependent on A (denoted A → B), if each value of A is associated with exactly one value of B."

Chapter 14.

The attribute(s) on the left side are the determinants.

For two attributes-- employee id and employee name-- we may conclude that employee id functionally determines employee name, if the sample represents all instances of that data.

Transitive Dependency

Transitive Dependency relies on more than two attributes to existing. In other words, transitive dependency exists only if there are more than two attributes. Then, one attribute indirectly functionally determines another attribute-- using a common attribute. The formal definition is:

A condition where A, B, and C are attributes of a relation such that dependency if A → B and B → C, then C is transitively dependent on A via B.

Full functional dependency – Excelling Paper.

In other words, if employee id functionally determines both branch number and branch address, it would be an issue. Why?

The short explanation is that branch address does not directly depend on the emp id in any way, so we cannot associate it as functionally dependent on emp id. Therefore, a better solution is to create a new relationship where branch id functionally determines branch address.

Introduction to SQL Normalization: Removing Redundancy

Normalization refers to a process for improving data integrity and minimizing redundancy. It provides us with a desired set of relations. The main goal is:

  • minimize redundancy
  • eliminate undesirable characteristics like Insertion, Update and Deletion Anomalies in SQL databases.

There are five forms of normalization-- six to be exact.

However, we mostly normalize our data up to BCNF only.

Don't worry if you don't understand these terms yet; you will know everything by the end of this article.

First Normal Form: Remove Redundancy

First Normal Form refers to removing duplicate values. As such, the goal is to ensure that every intersection of row and column contains only one data entry.

One approach is to create a new relation for repeating groups(columns with multiple values for each instance); the other one is to fill in the missing information on non-repeating groups.

Second Normal Form: Remove Partial Dependency

For a relation to be in 2NF, it must have no partial dependencies. A partial dependency violates a condition of Functional Dependency-- the minimal number of attributes necessary to maintain the relationship.

Copy the partially dependent attributes along with the primary key into a new relation.

Third Normal Form: Remove Transitive Dependency

The aim of 3NF is to remove transitive dependencies. A relation in Third Normal Form is also in 2NF and 1NF. This highlights the hierarchical structure of SQL Normalization. Moreover, a relation in 3NF has attributes that are not transitively dependent on each other.

Remove the transitively dependent attributes by copying of the primary key to a new relation.

Boyce Codd Normal Form

Boyce Codd Normal Form, or BCNF, is an improvement on Third Normal Form. Consequently, it adds to the condition of the Third Normal Form. Every determinant must be a candidate key for a table in BCNF.

Fourth Normal Form

The Fourth Normal Form consists of dealing with Multivalued Dependency. In other words, it exists when an A has a set of values for B and C.

Fifth Normal Form: Remove Join Dependency

A table is said to be in the Fifth Normal Form if it is without join dependency.

Example:

normalization of database table
Sample table to normalize

Attached above is a sample table of data that we will try to normalize.

For such examples, the key is to spend a few minutes reading and understanding the table and data requirements. It is key for any progress in real-life situations.

Remember: Functional Dependency is a one-to-one relationship that holds for all data instances with a few dependent attributes.

Using this example, I have broken down the normalization process in a SQL database in ten steps. I have made it so that it is easier for you to follow little steps rather than skimming over some details. Hopefully-- as this is my aim--, you will be able to understand the way of going about for such a problem in SQL. Therefore, following these steps below will allow you to fundamentally proceed with solving SQL Normalization problems for your use-cases.

10 Steps For Normalization Problems in SQL Database

First Step: Find Functional Dependencies

Firstly, find the functional dependences. This is important information regarding such an example. After that, you can better understand the problem. As explained above, quite a few standard forms are based on violating a kind of Functional Dependency.

It's essential to understand the data:

  • Functional Dependency 1: National Identity Number (NIN) determines the Employee Name.
  • FD 2: Moreover, NIN and Contract Number, combined, determine the number of hours.
  • FD 3: Furthermore, the Contract Number determines both the House Number and House Location.

How do you figure this information out? Try and see what remains the same for an attribute for a duplicate value. For instance, Contract Number C1024 repeats twice, while both the House Number and House Location remain the same.

  • Functional Dependency 4: Lastly, House Number determines House Location.

Second Step: Determine Primary Keys

After that, it's the ideal time to find out the Primary Key(s) for this relation. An easy way to do this is to see if a single attribute functionally determines the remaining attributes. If not, then try different combinations that determine the remaining attributes. For example, NIN and Contract Number form a Composite Primary Key above.

functional dependency and first normal form normalization

Third Step: Check for Unnormalized Form and First Normal Form

Moreover, check if this relation is is Unnormalized Form or 1NF in the next step,

It's quite easy to determine: each row and column intersection contains only one data instance.

Therefore, to conclude, this relation is already in the First Normal Form.

Note: Is it clear how to establish if a table or relation is already in 1NF? Will you be able to do it on your own following these key points?

Fourth Step: Is Second Normal Form Applicable?

Firstly, does Second Normal Form apply to this relation? Yes, because there is a composite Primary Key. Is the existing relation also in Second Normal Form? Well, no.

Let's explore why. The way to do this is to look for partial dependencies, i.e. if a non-primary key is wholly dependent on a composite primary key or not.

Hence, in this example, it's clear that FD 2 and 3 violate this. In FD 2, a partial primary key (NIN) determines a non-primary key (Employee Name).

Similarly, in FD 3, the partial primary key (Contract Number) determines two non-primary keys (House Number and House Location).

Therefore, we must normalize this table upto the 2NF.

how to create 2NF or Second Normal Form?

Sixth Step: Normalization of 1NF to 2NF

How do you get this relation or table in 2NF? To normalize the relation into Second Normal Form requires the creation of two new relations.

This means that the non-primary-key attributes are removed along with a copy of the part of the primary key that fully functionally determines them. The number of tables in our database have increased.

Seventh Step: Check if Third Normal Form Applicable

Great! You have made excellent progress already.

Now... Ask yourself: Is the relation in 3NF? Not quite.

Why? Transitive Dependency exists in this table. FD 4 is an example as House Location is transitively dependent on Contact Number.

3NF or Third Normal Form which is final form for this example of normalization

Eight Step: Normalization of 2NF to 3NF

To transform this relation into 3NF, create a new relation-- so that no non-primary key is transitively dependent on the primary key.

Therefore, a new relation with House Number and House Location is created. As such, our number of tables have increased again.

Note: Can you see how the increasing number of tables might be a problem? The number of joins will increase. Therefore, the time taken for a query to run may be longer. This might not be a major issue for simple databases with limited data, but it is a massive one complex databases.

Ninth Step: Check if BCNF Applicable

Check: does Boyce-Codd Normal Form need to apply here? No, because no non-candidate key is a determinant.

Hence, this relation is already in BCNF. Great news, right? Yes! After that, there's only more check or step to make.

Tenth Step: 4NF and 5NF

This relation is also in Fourth Normal Form as there are no non-trivial multivalued dependencies. Furthermore, there is no join dependency so it is also in Fifth Normal Form.

That's all! You made it to the end! In conclusion, you have taken a problem and normalized it.

Conclusion:

In conclusion, the steps listed above will help you solve any real-life example. Similarly, this example helps demonstrate the technique or way of solving such a question effectively. Moreover, I hope you understood that a complex problem can also still be solved using this same architecture of steps. Don't be fearful of such problems; stick to what you know-- the basics. I also want to ensure that your learning process is enhanced by breaking it down into simpler terms. At the end, the crux still remains same.

Can you now answer the question: "What is Normalization?". If the answer is yes, then well done-- you made it! If not, don't worry, re-read the post and check another reference material. It's okay if you're struggling. It's a big ask to grasp suddenly. Only with practice and implementation can you get better.

I hope you liked this post. Please drop a like or comment below with your thoughts. Feel free to let me know below if something was unclear or if you want help with a matter in the chat box below. Check more of our articles if you are interested in topics like SQL Database Flavors or CRUD operations in Python.

Stay safe. Ciao!