Categories
Sotfware & Developers & DevOps Tools & How-Tos

Normalization of SQL Database: Remove Redundancy

Normalization of database is an essential skill to know as a database designer and developer. I still remember being intimidated by what it would bring on– back when I took a SQL database course. It was scary hearing about it. Let me assure you: it is easier than you might imagine. What is important though, is that you understand the structure of a database quite well before beginning with this article. The result of a normalized database is an accessible retrieval of desired or suitable attributes by removing redundancy. However, take care as it can backfire if care is not taken. For this article, you will first be introduced to some key basic concepts that are fundamental to your understanding of normalization. One such concept is Functional Dependency.

DB normalization

Functional Dependency

Functional Dependency describes the relationship between different attributes and it is a property of schema– intension.. It’s a crucial concept to understand before proceeding with normalization. It has a one-to-one relationship between the attributes, and it holds for all instances.

“If A and B are attributes of relation R, B is functionally dependent on A (denoted A → B), if each value of A is associated with exactly one value of B.”

Chapter 14.

The attribute(s) on the left side are the determinants.

For two attributes– employee id and employee name– we may conclude that employee id functionally determines employee name, if the sample represents all instances of that data.

Transitive Dependency

Transitive Dependency relies on more than two attributes to existing. In other words, transitive dependency exists only if there are more than two attributes. Then, one attribute indirectly functionally determines another attribute– using a common attribute. The formal definition is:

A condition where A, B, and C are attributes of a relation such that dependency if A → B and B → C, then C is transitively dependent on A via B.

Full functional dependency – Excelling Paper.

In other words, if employee id functionally determines both branch number and branch address, it would be an issue. Why?

The short explanation is that branch address does not directly depend on the emp id in any way, so we cannot associate it as functionally dependent on emp id. Therefore, a better solution is to create a new relationship where branch id functionally determines branch address.

Introduction to SQL Normalization: Removing Redundancy

Normalization refers to a process for improving data integrity and minimizing redundancy. It provides us with a desired set of relations. The main goal is:

  • minimize redundancy
  • eliminate undesirable characteristics like Insertion, Update and Deletion Anomalies in SQL databases.

There are five forms of normalization– six to be exact.

However, we mostly normalize our data up to BCNF only.

Don’t worry if you don’t understand these terms yet; you will know everything by the end of this article.

First Normal Form: Remove Redundancy

First Normal Form refers to removing duplicate values. As such, the goal is to ensure that every intersection of row and column contains only one data entry.

One approach is to create a new relation for repeating groups(columns with multiple values for each instance); the other one is to fill in the missing information on non-repeating groups.

Second Normal Form: Remove Partial Dependency

For a relation to be in 2NF, it must have no partial dependencies. A partial dependency violates a condition of Functional Dependency– the minimal number of attributes necessary to maintain the relationship.

Copy the partially dependent attributes along with the primary key into a new relation.

Third Normal Form: Remove Transitive Dependency

The aim of 3NF is to remove transitive dependencies. A relation in Third Normal Form is also in 2NF and 1NF. This highlights the hierarchical structure of SQL Normalization. Moreover, a relation in 3NF has attributes that are not transitively dependent on each other.

Remove the transitively dependent attributes by copying of the primary key to a new relation.

Boyce Codd Normal Form

Boyce Codd Normal Form, or BCNF, is an improvement on Third Normal Form. Consequently, it adds to the condition of the Third Normal Form. Every determinant must be a candidate key for a table in BCNF.

Fourth Normal Form

The Fourth Normal Form consists of dealing with Multivalued Dependency. In other words, it exists when an A has a set of values for B and C.

Fifth Normal Form: Remove Join Dependency

A table is said to be in the Fifth Normal Form if it is without join dependency.

Example:

normalization of database table
Sample table to normalize

Attached above is a sample table of data that we will try to normalize.

For such examples, the key is to spend a few minutes reading and understanding the table and data requirements. It is key for any progress in real-life situations.

Remember: Functional Dependency is a one-to-one relationship that holds for all data instances with a few dependent attributes.

Using this example, I have broken down the normalization process in a SQL database in ten steps. I have made it so that it is easier for you to follow little steps rather than skimming over some details. Hopefully– as this is my aim–, you will be able to understand the way of going about for such a problem in SQL. Therefore, following these steps below will allow you to fundamentally proceed with solving SQL Normalization problems for your use-cases.

10 Steps For Normalization Problems in SQL Database

First Step: Find Functional Dependencies

Firstly, find the functional dependences. This is important information regarding such an example. After that, you can better understand the problem. As explained above, quite a few standard forms are based on violating a kind of Functional Dependency.

It’s essential to understand the data:

  • Functional Dependency 1: National Identity Number (NIN) determines the Employee Name.
  • FD 2: Moreover, NIN and Contract Number, combined, determine the number of hours.
  • FD 3: Furthermore, the Contract Number determines both the House Number and House Location.

How do you figure this information out? Try and see what remains the same for an attribute for a duplicate value. For instance, Contract Number C1024 repeats twice, while both the House Number and House Location remain the same.

  • Functional Dependency 4: Lastly, House Number determines House Location.

Second Step: Determine Primary Keys

After that, it’s the ideal time to find out the Primary Key(s) for this relation. An easy way to do this is to see if a single attribute functionally determines the remaining attributes. If not, then try different combinations that determine the remaining attributes. For example, NIN and Contract Number form a Composite Primary Key above.

functional dependency and first normal form normalization

Third Step: Check for Unnormalized Form and First Normal Form

Moreover, check if this relation is is Unnormalized Form or 1NF in the next step,

It’s quite easy to determine: each row and column intersection contains only one data instance.

Therefore, to conclude, this relation is already in the First Normal Form.

Note: Is it clear how to establish if a table or relation is already in 1NF? Will you be able to do it on your own following these key points?

Fourth Step: Is Second Normal Form Applicable?

Firstly, does Second Normal Form apply to this relation? Yes, because there is a composite Primary Key. Is the existing relation also in Second Normal Form? Well, no.

Let’s explore why. The way to do this is to look for partial dependencies, i.e. if a non-primary key is wholly dependent on a composite primary key or not.

Hence, in this example, it’s clear that FD 2 and 3 violate this. In FD 2, a partial primary key (NIN) determines a non-primary key (Employee Name).

Similarly, in FD 3, the partial primary key (Contract Number) determines two non-primary keys (House Number and House Location).

Therefore, we must normalize this table upto the 2NF.

how to create 2NF or Second Normal Form?

Sixth Step: Normalization of 1NF to 2NF

How do you get this relation or table in 2NF? To normalize the relation into Second Normal Form requires the creation of two new relations.

This means that the non-primary-key attributes are removed along with a copy of the part of the primary key that fully functionally determines them. The number of tables in our database have increased.

Seventh Step: Check if Third Normal Form Applicable

Great! You have made excellent progress already.

Now… Ask yourself: Is the relation in 3NF? Not quite.

Why? Transitive Dependency exists in this table. FD 4 is an example as House Location is transitively dependent on Contact Number.

3NF or Third Normal Form which is final form for this example of normalization

Eight Step: Normalization of 2NF to 3NF

To transform this relation into 3NF, create a new relation– so that no non-primary key is transitively dependent on the primary key.

Therefore, a new relation with House Number and House Location is created. As such, our number of tables have increased again.

Note: Can you see how the increasing number of tables might be a problem? The number of joins will increase. Therefore, the time taken for a query to run may be longer. This might not be a major issue for simple databases with limited data, but it is a massive one complex databases.

Ninth Step: Check if BCNF Applicable

Check: does Boyce-Codd Normal Form need to apply here? No, because no non-candidate key is a determinant.

Hence, this relation is already in BCNF. Great news, right? Yes! After that, there’s only more check or step to make.

Tenth Step: 4NF and 5NF

This relation is also in Fourth Normal Form as there are no non-trivial multivalued dependencies. Furthermore, there is no join dependency so it is also in Fifth Normal Form.

That’s all! You made it to the end! In conclusion, you have taken a problem and normalized it.

Conclusion:

In conclusion, the steps listed above will help you solve any real-life example. Similarly, this example helps demonstrate the technique or way of solving such a question effectively. Moreover, I hope you understood that a complex problem can also still be solved using this same architecture of steps. Don’t be fearful of such problems; stick to what you know– the basics. I also want to ensure that your learning process is enhanced by breaking it down into simpler terms. At the end, the crux still remains same.

Can you now answer the question: “What is Normalization?”. If the answer is yes, then well done– you made it! If not, don’t worry, re-read the post and check another reference material. It’s okay if you’re struggling. It’s a big ask to grasp suddenly. Only with practice and implementation can you get better.

I hope you liked this post. Please drop a like or comment below with your thoughts. Feel free to let me know below if something was unclear or if you want help with a matter in the chat box below. Check more of our articles if you are interested in topics like SQL Database Flavors or CRUD operations in Python.

Stay safe. Ciao!

Hits: 13

Categories
Sotfware & Developers & DevOps Tools & How-Tos

Relational Database Design: ER Modelling

Database Design: ER Modelling
Highbrow

In this article, we’ll explore the steps of ER modelling for database design. By the end, you will know what database design is and how to create an entity-relationship diagram (ER). In addition, you will also learn how to translate an ER from a case study.

In addition, you may want to check some related posts like Picking The Right SQL Database Flavor For You. Or, maybe you want to learn about CRUD Operations: Understanding The Role in MySQL.

What is Database Design?

Database design is one of the most important skills you can own today. However, you might be lost on what it actually means.

In simple terms, it is to draw up a structure for a database dev to implement a business’s system.

What is an ER Design?

An Entity Relationship (ER) design is a top-down visual schematic of the database. In other words, it is a flowchart of entities– group of objects with similar properties– attributes and their relationships.

An ER diagram, or ERD, is part of conceptual database design. Conceptual design is free of RDBMS selection or physical constrains. This facilitates the logical and physical database design.

ERs visually explain the relationship types between entities and their constraints, using a modelling language.

UML, or Unified Modelling Language, describes ER diagrams. For instance, only a particular set of symbols represents ER diagrams.

Why Use ER Diagrams?

After that, if you’re wondering why you should use ER diagrams altogether, then let’s see some reasons below:

  • Provides a preview of how your tables connect, their keys and the attributes
  • Helps to visualize entities, attributes, relationships together
  • Serves as blueprint for conversion into relational tables

Get Started With ER Modelling?

Entity

An entity is an object or class, sharing similar properties. There are two types of entities: strong and weak entities.

  • Strong entities: has its own primary key and exists independently.
  • Weak entities: depends on a strong entity for existence.

Rectangles represent entities in UML.

Relationship

Relationship type defines the connectivity of entities with each other in an organized manner. This allows the database dev to see how entities interact. Sometimes, there are relationship type attributes as well.

There are four types of relationships:

  • 1-to-1
  • 1-To-Many
  • Many-To-1
  • Many-To-Many

A line typically represents a relationship in UML, labelled with a name and its multiplicity constraint.

Attribute

Attributes are the properties that define an entity. There are various types of attributes:

  • Single-valued: Attributes that return one instance for a tuple. For instance, first name.
  • Multivalued: Attributes that may return multiple instances for a tuple. For instance, telephone number– you may have multiple numbers.
  • Derived: Attributes that are dependent for their values. For instance, age is derived from the birthdate.
  • Simple: Non-divisible (atomic) attributes. For instance, ssn id.
  • Composite: Divisible attributes sometimes grouped together. For instance, address may be subdivided into street, city, state, country, and zip code.

Attributes are listed in the second compartment of the entity in UML.

An Example to Help Illustrate ER Design: From Zero to Database Designer

Case Study: A Video Rental Company

The case study describes the data requirements for a video rental company.

The video rental company has several branches throughout the USA. The data on each branch is the branch address made up of street, city, state, and zip code, and the telephone number. Each branch has a branch number, which is unique throughout the company. Each branch is allocated staff, which includes a Manager.

The Manager is responsible for the day-to-day running of a given branch. The data held on a member of staff is his or her name, position, and salary. Each member of staff has a staff number, which is unique throughout the company.

Each branch has a stock of videos. The data held on a video is the catalog number, video number, title, category, daily rental, cost, status, and the names of the main actors and the director. In addition, the catalog number uniquely identifies each video. However, in most cases, there are several copies of each video at a branch, and the individual copies are identified using the video number.

Video Rental Company (Continued)

Each video has a category such as Action, Adult, Children, Drama, Horror, or Sci-Fi. The status indicates whether a specific copy of a video is available for rent. Before hiring a video from the company, a customer must first register as a member of a local branch.

The data held on a member is the first and last name, address, and the date that the member registered at a branch. Each member has a member number, which is unique throughout all branches of the company. Once registered, a member is free to rent videos, up to a maximum of ten at any one time.

The data held on each video rented is the rental number, the full name and number of the member, the video number, title, and daily rental, and the dates the video is rented out and returned. Therefore, the rental number is unique throughout the company.

Designing Steps (Basic):

Scared by the length of that case study? Worry not.

The easiest way to understand and translate any case studies is to break it down. Therefore, this is what we will do below, explaining the steps you must take.

Firstly, identify the main entities from this case study for your database schema. You can do this by circling the key Nouns you find.

What are the possible entities? For instance, the first one is the Branch.

Then, you come across Staff, Member/Customer, Video. VideoRental, and Rental Agreement. After that, there’s one more entity that could be considered a weak entity here– Registration. Perhaps, you could also try an alternative. That is, to use Registration as an attribute of the relationship between branch and member.

conceptual er diagram with entity and relational database entities

After that, identify the relationship types between these entities. You can do this by establishing how one entity links to another. For example a staff member supervises the rest of the staff. Hence, this is a recursive relationship–“supervises” for the supervisor(part of staff entity) and “supervised by” for the remaining staff. Moreover, a member must “agree” to Rental Agreement. Try to determine as many links as you can.

Designing Steps (Advanced):

For example, “.. branch number is unique throughout the company” states that branch number is branch’s primary key.

The third step is to determine the primary and secondary keys in the entities. A primary key (PK) is the distinguishing attribute of an entity. You can identify them by reading the case study carefully– you may also need business logic.

Secondary keys are candidate keys that could not independently distinguish the tuples. Consequently, they were not chosen as PKs and they are also called Alternate keys.

After this, a crucial step is determining multiplicity constraints. This is a sum of cardinality and participation constraint. In simple words, multiplicity constraint identifies the involvement of an entity in a relationship.. It is read as “at least.. to as many as”.

Participation represents if occurrence of an entity in a relationship is mandatory or not. It may be 0 or 1. On the other hand, cardinality represents the maximum number of occurrences of that entity. It may be 0,1… or n.

But, how do you do this? Easier said than done, right?

You must have some business logic and know-how of an individual business to do well here. For instance, let’s suppose the relationship between branch and staff. The way to identify multiplicity constraints is to visualize the problem.

Is staff mandatory for a branch to exist? Is branch mandatory for staff to exist?
These questions result in participation (P).

For each branch, how many staff can exist? For a managerial staff, how many branches may exist?

These questions result in cardinality (C).

Multiplicity is written as “P..C”.

er model erd with relationship and multiplicity constraints for database design

Conclusion

In conclusion, I hope you were able to learn how to convert a real-life case into an ERD. The case study may have looked scary initially, but after breaking it down, it was quite simple to deal with, wasn’t it? Therefore, the key– always– is to follow a step-by-step procedure for such cases.

Please like the post if you learned something new here or appreciated the efforts. Subscribe to our blog to stay updated on new content. Lastly, let us know your thoughts about this post in the comments. Moreover, let us know if you face any issues.

Thanks for reading. Stay safe and stay healthy.

Hits: 14