Normal Forms in DBMS: Reducing Data Redundancy and Avoiding Anomalies

Updated on October 7, 2024

Article Outline

Importance and Benefits of Normalisation in DBMS Key Concepts in Database Normalisation Detailed Exploration of Normal Forms in DBMS Balancing Normalisation and Database Performance Best Practices for Implementing Normalisation in Real-World Databases Conclusion FAQs

What causes the databases you use most frequently to become filled with repeated entry information? Does changing one piece of information often result in a ‘snowball effect,’ where correcting one problem throws another one into the mix?

These are typical scenarios in the database management domain, which may result from insufficient normalisation.

Data normalisation is the key process that helps maintain a clean, efficient, and easily manageable database.

It makes a significant contribution to the normal functioning of a database through its purpose of putting all data in its rightful place and allowing for future expansion or growth without adverse outcomes. In simple terms, normal forms in DBMS are about minimising redundancy in data and avoiding problems such as insertion, deletion, and update anomalies.

But what do normal forms in DBMS mean? Now, it’s time to get deeper into details and understand why these concepts are crucial.

Importance and Benefits of Normalisation in DBMS

Now, the question that arises here is: why does normalisation matter?

Suppose you are the manager of student records in a school. Each time a student moves to a new address, you update a record for each course in which the student is enrolled. If you miss an update, you end up with confusing and often contradicting details.

This is where normal forms in DBMS come in.

By structuring the data properly, you only need to store the student’s address once, and any changes are automatically reflected everywhere.

Here’s what normalisation brings to the table:

Ensures that data are consistent across the database.
Minimises duplicate data, saving storage space and making the database more efficient.
Maintain accurate and reliable data by eliminating anomalies.

But normalisation isn’t just about reducing headaches; it’s about creating a foundation that supports future growth.

As your database grows, a well-normalised structure will handle the increasing complexity with ease.

Get curriculum highlights, career paths, industry insights and accelerate your technology journey.

Download brochure

Key Concepts in Database Normalisation

Defining Data Redundancy and Its Impact on Databases

Data redundancy occurs when the same piece of data is stored in multiple places within a database. As the database grows, redundancy can lead to serious problems.

Imagine storing the same customer’s contact details in five different tables. If the customer changes their phone number, you must update all five records.

Miss one, and you’ve got conflicting data, which can cause confusion and errors.

This is where normal forms in DBMS shine.

By organising the data to eliminate redundancy, we can ensure that each piece of data is stored only once. This makes updates simpler and prevents the inconsistencies that redundancy can cause.

Understanding Database Anomalies: Insertion, Deletion, and Update Anomalies

Anomalies are errors or inconsistencies that occur when data is added, deleted, or updated in a database.

They are the unwanted side effects of poor database design.

Insertion Anomalies:

These happen when you can’t insert data into a table because other data is missing.

For example, if you’re adding a new student to a database but can’t do so because their course information isn’t available yet, that’s an insertion anomaly.

Deletion Anomalies:

These occur when deleting data unintentionally removes other valuable data.

For instance, if deleting a course record also removes the details of the students enrolled in that course, you’ve encountered a deletion anomaly.

Update Anomalies:

These arise when updating a single piece of data, which requires multiple updates across the database.

If you change a customer’s address in one table but forget to do so in another, you’ve got conflicting information—an update anomaly.

Detailed Exploration of Normal Forms in DBMS

Most of the problems are very well prevented by knowing and applying the different normal forms in DBMS. Now, let us discuss normal forms in DBMS and understand how these actually keep our databases clean and efficient.

First Normal Form (1NF): Ensuring Atomicity of Data

First Normal Form (1NF) is all about making sure every piece of data in your table is atomic.

Atomic means that each cell in a table contains only one value. No lists, no sets, just one value.

Imagine a table storing student contact information. If a single cell holds multiple phone numbers, that’s a problem.

Input Table (Before 1NF):

StudentID	Name	PhoneNumbers
1	Rahul Sharma	9876543210, 9123456789
2	Anjali Verma	9988776655
3	Priya Singh	9876512345, 9888123456
4	Rohan Kapoor	9988123456
5	Neha Gupta	9112233445, 9223344556

Code to Apply 1NF:

CREATE TABLE Students (

StudentID INT,

Name VARCHAR(50),

PhoneNumber VARCHAR(15)

); 

-- Adding data

INSERT INTO Students (StudentID, Name, PhoneNumber)

VALUES

(1, 'Rahul Sharma', '9876543210'),

(1, 'Rahul Sharma', '9123456789'),

(2, 'Anjali Verma', '9988776655'),

(3, 'Priya Singh', '9876512345'),

(3, 'Priya Singh', '9888123456'),

(4, 'Rohan Kapoor', '9988123456'),

(5, 'Neha Gupta', '9112233445'),

(5, 'Neha Gupta', '9223344556');

Output Table (After 1NF):

StudentID	Name	PhoneNumber
1	Rahul Sharma	9876543210
1	Rahul Sharma	9123456789
2	Anjali Verma	9988776655
3	Priya Singh	9876512345
3	Priya Singh	9888123456
4	Rohan Kapoor	9988123456
5	Neha Gupta	9112233445
5	Neha Gupta	9223344556

Second Normal Form (2NF): Eliminating Partial Dependencies

Moving to the Second Normal Form (2NF), we focus on getting rid of partial dependencies. This means every non-key attribute must be fully dependent on the primary key.

Consider a table where each student’s name depends only on their StudentID, but course details are also mixed in:

Input Table (Before 2NF):

StudentID	Name	CourseID	CourseName	Instructor
1	Rahul Sharma	101	Mathematics	Dr. Mehta
1	Rahul Sharma	102	Physics	Dr. Khan
2	Anjali Verma	101	Mathematics	Dr. Mehta
3	Priya Singh	103	Chemistry	Dr. Patel
4	Rohan Kapoor	104	Biology	Dr. Iyer

Code to Apply 2NF:

CREATE TABLE Students (

StudentID INT,

Name VARCHAR(50)

); 

CREATE TABLE Courses (

CourseID INT,

CourseName VARCHAR(50),

Instructor VARCHAR(50)

); 

CREATE TABLE StudentCourses (

StudentID INT,

CourseID INT

); 

-- Adding data

INSERT INTO Students (StudentID, Name)

VALUES

(1, 'Rahul Sharma'),

(2, 'Anjali Verma'),

(3, 'Priya Singh'),

(4, 'Rohan Kapoor'); 

INSERT INTO Courses (CourseID, CourseName, Instructor)

VALUES

(101, 'Mathematics', 'Dr. Mehta'),

(102, 'Physics', 'Dr. Khan'),

(103, 'Chemistry', 'Dr. Patel'),

(104, 'Biology', 'Dr. Iyer'); 

INSERT INTO StudentCourses (StudentID, CourseID)

VALUES

(1, 101),

(1, 102),

(2, 101),

(3, 103),

(4, 104);

Output Tables (After 2NF):

Students Table:

StudentID	Name
1	Rahul Sharma
2	Anjali Verma
3	Priya Singh
4	Rohan Kapoor

Courses Table:

CourseID	CourseName	Instructor
101	Mathematics	Dr. Mehta
102	Physics	Dr. Khan
103	Chemistry	Dr. Patel
104	Biology	Dr. Iyer

StudentCourses Table:

StudentID	CourseID
1	101
1	102
2	101
3	103
4	104

Third Normal Form (3NF): Removing Transitive Dependencies

The Third Normal Form (3NF) takes it a step further. Here, we remove transitive dependencies, where non-key columns depend on other non-key columns.

Let’s say we have a table where each student’s city depends on their postal code but not directly on the StudentID:

Input Table (Before 3NF):

StudentID	Name	PostalCode	City	State
1	Rahul Sharma	110001	New Delhi	Delhi
2	Anjali Verma	400001	Mumbai	Maharashtra
3	Priya Singh	600001	Chennai	Tamil Nadu
4	Rohan Kapoor	110001	New Delhi	Delhi
5	Neha Gupta	400001	Mumbai	Maharashtra

Code to Apply 3NF:

CREATE TABLE StudentDetails (

StudentID INT,

Name VARCHAR(50),

PostalCode VARCHAR(10)

); 

CREATE TABLE PostalCodes (

PostalCode VARCHAR(10),

City VARCHAR(50),

State VARCHAR(50)
); 

-- Adding data

INSERT INTO StudentDetails (StudentID, Name, PostalCode)

VALUES

(1, 'Rahul Sharma', '110001'),

(2, 'Anjali Verma', '400001'),

(3, 'Priya Singh', '600001'),

(4, 'Rohan Kapoor', '110001'),

(5, 'Neha Gupta', '400001'); 

INSERT INTO PostalCodes (PostalCode, City, State)

VALUES

('110001', 'New Delhi', 'Delhi'),

('400001', 'Mumbai', 'Maharashtra'),

('600001', 'Chennai', 'Tamil Nadu');

Output Tables (After 3NF):

StudentDetails Table:

StudentID	Name	PostalCode
1	Rahul Sharma	110001
2	Anjali Verma	400001
3	Priya Singh	600001
4	Rohan Kapoor	110001
5	Neha Gupta	400001

PostalCodes Table:

PostalCode	City	State
110001	New Delhi	Delhi
400001	Mumbai	Maharashtra
600001	Chennai	Tamil Nadu

Boyce-Codd Normal Form (BCNF): A Stricter Version of 3NF

Boyce-Codd Normal Form (BCNF) is like 3NF but stricter. Every determinant in the table must be a superkey.

This implies that X should be a superkey for each functional relationship X → Y.

Consider a scenario where an employee’s project leader is determined by the ProjectID, but ProjectID isn’t a primary key:

Input Table (Before BCNF):

EmployeeID	Department	ProjectID	ProjectLeader
1	HR	201	Arjun Singh
2	IT	202	Rohit Mehra
3	HR	201	Arjun Singh
4	IT	203	Neha Saini
5	Finance	204	Rajesh Gupta

Code to Apply BCNF:

CREATE TABLE Projects (

ProjectID INT,

ProjectLeader VARCHAR(50)
); 

CREATE TABLE EmployeeProjects (

EmployeeID INT,

ProjectID INT,

Department VARCHAR(50)

); 

-- Adding data

INSERT INTO Projects (ProjectID, ProjectLeader)

VALUES

(201, 'Arjun Singh'),

(202, 'Rohit Mehra'),

(203, 'Neha Saini'),

(204, 'Rajesh Gupta'); 

INSERT INTO EmployeeProjects (EmployeeID, ProjectID, Department)

VALUES

(1, 201, 'HR'),

(2, 202, 'IT'),

(3, 201, 'HR'),

(4, 203, 'IT'),

(5, 204, 'Finance');

Output Tables (After BCNF):

Projects Table:

ProjectID	ProjectLeader
201	Arjun Singh
202	Rohit Mehra
203	Neha Saini
204	Rajesh Gupta

EmployeeProjects Table:

EmployeeID	ProjectID	Department
1	201	HR
2	202	IT
3	201	HR
4	203	IT
5	204	Finance

Fourth Normal Form (4NF): Handling Multi-Valued Dependencies

Fourth Normal Form (4NF) deals with multivalued dependencies. These occur when one column in a table depends on multiple values in another column.

For example, a table that lists students and their multiple hobbies:

Input Table (Before 4NF):

StudentID	Name	Hobby	Language
1	Rahul Sharma	Painting	English
1	Rahul Sharma	Cycling	Hindi
2	Anjali Verma	Singing	English
2	Anjali Verma	Dancing	Tamil
3	Priya Singh	Reading	English

Code to Apply 4NF:

CREATE TABLE Students (

StudentID INT,

Name VARCHAR(50)

); 

CREATE TABLE Hobbies (

StudentID INT,

Hobby VARCHAR(50)

); 

CREATE TABLE Languages (

StudentID INT,

Language VARCHAR(50)

); 

-- Adding data

INSERT INTO Students (StudentID, Name)

VALUES

(1, 'Rahul Sharma'),

(2, 'Anjali Verma'),

(3, 'Priya Singh'); 

INSERT INTO Hobbies (StudentID, Hobby)

VALUES

(1, 'Painting'),

(1, 'Cycling'),

(2, 'Singing'),

(2, 'Dancing'),

(3, 'Reading'); 

INSERT INTO Languages (StudentID, Language)

VALUES

(1, 'English'),

(1, 'Hindi'),

(2, 'English'),

(2, 'Tamil'),

(3, 'English');

Output Tables (After 4NF):

Students Table:

StudentID	Name
1	Rahul Sharma
2	Anjali Verma
3	Priya Singh

Hobbies Table:

StudentID	Hobby
1	Painting
1	Cycling
2	Singing
2	Dancing
3	Reading

Languages Table:

StudentID	Language
1	English
1	Hindi
2	English
2	Tamil
3	English

Fifth Normal Form (5NF): Ensuring Lossless Decomposition

Finally, the Fifth Normal Form (5NF) focuses on ensuring that data can be decomposed into smaller tables without losing any information.

This is crucial when dealing with complex relationships, such as product assemblies:

Input Table (Before 5NF):

AssemblyID	ProductID	ComponentID	SupplierID
1	301	401	501
1	301	402	502
2	302	403	501
2	302	404	503
3	303	405	504

Code to Apply 5NF:

CREATE TABLE Assemblies (
AssemblyID INT,

ProductID INT

); 

CREATE TABLE Products (

ProductID INT,

ComponentID INT

); 

CREATE TABLE Suppliers (

ComponentID INT,

SupplierID INT

); 

-- Adding data

INSERT INTO Assemblies (AssemblyID, ProductID)

VALUES

(1, 301),

(2, 302),

(3, 303); 

INSERT INTO Products (ProductID, ComponentID)

VALUES

(301, 401),

(301, 402),

(302, 403),

(302, 404),

(303, 405); 

INSERT INTO Suppliers (ComponentID, SupplierID)

VALUES

(401, 501),

(402, 502),

(403, 501),

(404, 503),

(405, 504);

Output Tables (After 5NF):

Assemblies Table:

AssemblyID	ProductID
1	301
2	302
3	303

Products Table:

ProductID	ComponentID
301	401
301	402
302	403
302	404
303	405

Suppliers Table:

ComponentID	SupplierID
401	501
402	502
403	501
404	503
405	504

Balancing Normalisation and Database Performance

Is your database running slower after applying all those normal forms?

Normalisation is great for reducing redundancy and improving data integrity, but it can sometimes slow down performance.

With all these normal forms, it’s easy to think more normalisation is always better. But the trick is to strike the right balance between normalisation and speed.

Too much normalisation can lead to complex queries requiring multiple joins. This can bog down your system, especially with large datasets.

So, what’s the solution?

Here’s how we can find the right balance:

Start Simple: Normalise up to 3NF or BCNF for most applications.
Monitor Performance: Keep an eye on query times and database performance.
Denormalize When Needed: If performance issues arise, consider denormalizing selectively.

By balancing normal forms in DBMS with practical needs, we ensure our databases are both efficient and easy to work with.

Best Practices for Implementing Normalisation in Real-World Databases

Know When to Stop:

While it’s tempting to normalise to the highest level, it’s not always necessary. Most of the time, it doesn’t go beyond BCNF or 4NF.

Test Early and Often:

After each step of normalisation, run queries to see how the database handles them. This helps catch performance issues early.

Document Your Design:

Keep a clear record of your normalisation process. It’s easier to troubleshoot and explain your design decisions if problems arise later.

Keep the Future in Mind:

Think about how your database will grow. What works for a small dataset might not scale well. Plan your normalisation with an eye on future expansion.

Conclusion

Normal forms in DBMS are important to deal with the issue of data redundancy, maintain data consistency, and avoid data anomalies.

Starting with the basic normal forms such as 1NF, 2NF, and 3NF, we bring a solid standard to the framework.

In complex data dependencies, it is necessary to move to BCNF and further. However, each level of normalisation brings an extra level of complexity and problem, which affects the performance.

This is the reason why those people working on databases should be able to know when they need to stop and when they should be denormalised with the aim of improving efficiency.

A well-designed database is both organised and, at the same time, optimised according to your application’s needs, granting both accuracy and speed in real-life situations.

FAQs

What role does normalisation play in DBMS?

Normalisation helps an organisation to reduce data redundancy and avoid anomalies in a database. This would result in an efficient database, which is easy to manage.

Why is BCNF considered to be more strict compared with 3NF?

BCNF is just more rigid than 3NF; it gets rid of any form of dependency where the left-hand side is not a superkey. This ensures no unnecessary redundancy or anomaly in the database.

Is it always necessary to normalise a database to the highest normal form?

No, it's not always necessary. Most databases work efficiently up to 3NF or BCNF. Higher forms are used in specific cases where data integrity and minimal redundancy are crucial.

What are the potential disadvantages of over-normalizing a database?

Over-normalization can lead to complex queries and slower performance due to the need for multiple joins. This can be especially problematic in large databases where query speed is critical.

Can a database be partially normalised, and what are the implications?

Yes, a database can be partially normalised. This approach can help balance data integrity with performance. However, it may introduce some redundancy, which can complicate data management if not carefully monitored.

Updated on October 7, 2024

Link

Upskill with expert articles

View all