Normal Forms in DBMS: Reducing Data Redundancy and Avoiding Anomalies

Updated on October 7, 2024

Article Outline

What causes the databases you use most frequently to become filled with repeated entry information? Does changing one piece of information often result in a ‘snowball effect,’ where correcting one problem throws another one into the mix?

 

These are typical scenarios in the database management domain, which may result from insufficient normalisation.

 

Data normalisation is the key process that helps maintain a clean, efficient, and easily manageable database.

 

It makes a significant contribution to the normal functioning of a database through its purpose of putting all data in its rightful place and allowing for future expansion or growth without adverse outcomes. In simple terms, normal forms in DBMS are about minimising redundancy in data and avoiding problems such as insertion, deletion, and update anomalies.

 

But what do normal forms in DBMS mean? Now, it’s time to get deeper into details and understand why these concepts are crucial.

Importance and Benefits of Normalisation in DBMS

Now, the question that arises here is: why does normalisation matter?

 

Suppose you are the manager of student records in a school. Each time a student moves to a new address, you update a record for each course in which the student is enrolled. If you miss an update, you end up with confusing and often contradicting details.

 

This is where normal forms in DBMS come in.

 

By structuring the data properly, you only need to store the student’s address once, and any changes are automatically reflected everywhere.

 

Here’s what normalisation brings to the table:

 

  • Ensures that data are consistent across the database.
  • Minimises duplicate data, saving storage space and making the database more efficient.
  • Maintain accurate and reliable data by eliminating anomalies.

 

But normalisation isn’t just about reducing headaches; it’s about creating a foundation that supports future growth.

 

As your database grows, a well-normalised structure will handle the increasing complexity with ease.

*Image
Get curriculum highlights, career paths, industry insights and accelerate your technology journey.
Download brochure

Key Concepts in Database Normalisation

Defining Data Redundancy and Its Impact on Databases

Data redundancy occurs when the same piece of data is stored in multiple places within a database. As the database grows, redundancy can lead to serious problems.

 

Imagine storing the same customer’s contact details in five different tables. If the customer changes their phone number, you must update all five records.

 

Miss one, and you’ve got conflicting data, which can cause confusion and errors.

 

This is where normal forms in DBMS shine.

 

By organising the data to eliminate redundancy, we can ensure that each piece of data is stored only once. This makes updates simpler and prevents the inconsistencies that redundancy can cause.

Understanding Database Anomalies: Insertion, Deletion, and Update Anomalies

Anomalies are errors or inconsistencies that occur when data is added, deleted, or updated in a database.

 

They are the unwanted side effects of poor database design.

 

Insertion Anomalies:

 

These happen when you can’t insert data into a table because other data is missing.

 

For example, if you’re adding a new student to a database but can’t do so because their course information isn’t available yet, that’s an insertion anomaly.

 

Deletion Anomalies:

 

These occur when deleting data unintentionally removes other valuable data.

 

For instance, if deleting a course record also removes the details of the students enrolled in that course, you’ve encountered a deletion anomaly.

 

Update Anomalies:

 

These arise when updating a single piece of data, which requires multiple updates across the database.

 

If you change a customer’s address in one table but forget to do so in another, you’ve got conflicting information—an update anomaly.

Detailed Exploration of Normal Forms in DBMS

Most of the problems are very well prevented by knowing and applying the different normal forms in DBMS. Now, let us discuss normal forms in DBMS and understand how these actually keep our databases clean and efficient.

First Normal Form (1NF): Ensuring Atomicity of Data

First Normal Form (1NF) is all about making sure every piece of data in your table is atomic.

 

Atomic means that each cell in a table contains only one value. No lists, no sets, just one value.

 

Imagine a table storing student contact information. If a single cell holds multiple phone numbers, that’s a problem.

 

Input Table (Before 1NF):

 

StudentID Name PhoneNumbers
1 Rahul Sharma 9876543210, 9123456789
2 Anjali Verma 9988776655
3 Priya Singh 9876512345, 9888123456
4 Rohan Kapoor 9988123456
5 Neha Gupta 9112233445, 9223344556

 

Code to Apply 1NF:

CREATE TABLE Students ( StudentID INT, Name VARCHAR(50), PhoneNumber VARCHAR(15) );  -- Adding data INSERT INTO Students (StudentID, Name, PhoneNumber) VALUES (1, 'Rahul Sharma', '9876543210'), (1, 'Rahul Sharma', '9123456789'), (2, 'Anjali Verma', '9988776655'), (3, 'Priya Singh', '9876512345'), (3, 'Priya Singh', '9888123456'), (4, 'Rohan Kapoor', '9988123456'), (5, 'Neha Gupta', '9112233445'), (5, 'Neha Gupta', '9223344556');

Output Table (After 1NF):

 

StudentID Name PhoneNumber
1 Rahul Sharma 9876543210
1 Rahul Sharma 9123456789
2 Anjali Verma 9988776655
3 Priya Singh 9876512345
3 Priya Singh 9888123456
4 Rohan Kapoor 9988123456
5 Neha Gupta 9112233445
5 Neha Gupta 9223344556

 

Second Normal Form (2NF): Eliminating Partial Dependencies

Moving to the Second Normal Form (2NF), we focus on getting rid of partial dependencies. This means every non-key attribute must be fully dependent on the primary key.

 

Consider a table where each student’s name depends only on their StudentID, but course details are also mixed in:

 

Input Table (Before 2NF):

 

StudentID Name CourseID CourseName Instructor
1 Rahul Sharma 101 Mathematics Dr. Mehta
1 Rahul Sharma 102 Physics Dr. Khan
2 Anjali Verma 101 Mathematics Dr. Mehta
3 Priya Singh 103 Chemistry Dr. Patel
4 Rohan Kapoor 104 Biology Dr. Iyer

 

Code to Apply 2NF:

CREATE TABLE Students ( StudentID INT, Name VARCHAR(50) );  CREATE TABLE Courses ( CourseID INT, CourseName VARCHAR(50), Instructor VARCHAR(50) );  CREATE TABLE StudentCourses ( StudentID INT, CourseID INT );  -- Adding data INSERT INTO Students (StudentID, Name) VALUES (1, 'Rahul Sharma'), (2, 'Anjali Verma'), (3, 'Priya Singh'), (4, 'Rohan Kapoor');  INSERT INTO Courses (CourseID, CourseName, Instructor) VALUES (101, 'Mathematics', 'Dr. Mehta'), (102, 'Physics', 'Dr. Khan'), (103, 'Chemistry', 'Dr. Patel'), (104, 'Biology', 'Dr. Iyer');  INSERT INTO StudentCourses (StudentID, CourseID) VALUES (1, 101), (1, 102), (2, 101), (3, 103), (4, 104);

Output Tables (After 2NF):

 

  • Students Table:

 

StudentID Name
1 Rahul Sharma
2 Anjali Verma
3 Priya Singh
4 Rohan Kapoor

 

  • Courses Table:

 

CourseID CourseName Instructor
101 Mathematics Dr. Mehta
102 Physics Dr. Khan
103 Chemistry Dr. Patel
104 Biology Dr. Iyer

 

  • StudentCourses Table:

 

StudentID CourseID
1 101
1 102
2 101
3 103
4 104

 

Third Normal Form (3NF): Removing Transitive Dependencies

The Third Normal Form (3NF) takes it a step further. Here, we remove transitive dependencies, where non-key columns depend on other non-key columns.

 

Let’s say we have a table where each student’s city depends on their postal code but not directly on the StudentID:

 

Input Table (Before 3NF):

 

StudentID Name PostalCode City State
1 Rahul Sharma 110001 New Delhi Delhi
2 Anjali Verma 400001 Mumbai Maharashtra
3 Priya Singh 600001 Chennai Tamil Nadu
4 Rohan Kapoor 110001 New Delhi Delhi
5 Neha Gupta 400001 Mumbai Maharashtra

 

Code to Apply 3NF:

CREATE TABLE StudentDetails ( StudentID INT, Name VARCHAR(50), PostalCode VARCHAR(10) );  CREATE TABLE PostalCodes ( PostalCode VARCHAR(10), City VARCHAR(50), State VARCHAR(50) );  -- Adding data INSERT INTO StudentDetails (StudentID, Name, PostalCode) VALUES (1, 'Rahul Sharma', '110001'), (2, 'Anjali Verma', '400001'), (3, 'Priya Singh', '600001'), (4, 'Rohan Kapoor', '110001'), (5, 'Neha Gupta', '400001');  INSERT INTO PostalCodes (PostalCode, City, State) VALUES ('110001', 'New Delhi', 'Delhi'), ('400001', 'Mumbai', 'Maharashtra'), ('600001', 'Chennai', 'Tamil Nadu');

Output Tables (After 3NF):

 

  • StudentDetails Table:

 

StudentID Name PostalCode
1 Rahul Sharma 110001
2 Anjali Verma 400001
3 Priya Singh 600001
4 Rohan Kapoor 110001
5 Neha Gupta 400001

 

  • PostalCodes Table:

 

PostalCode City State
110001 New Delhi Delhi
400001 Mumbai Maharashtra
600001 Chennai Tamil Nadu

 

Boyce-Codd Normal Form (BCNF): A Stricter Version of 3NF

Boyce-Codd Normal Form (BCNF) is like 3NF but stricter. Every determinant in the table must be a superkey.

This implies that X should be a superkey for each functional relationship X → Y.

 

Consider a scenario where an employee’s project leader is determined by the ProjectID, but ProjectID isn’t a primary key:

 

Input Table (Before BCNF):

 

EmployeeID Department ProjectID ProjectLeader
1 HR 201 Arjun Singh
2 IT 202 Rohit Mehra
3 HR 201 Arjun Singh
4 IT 203 Neha Saini
5 Finance 204 Rajesh Gupta

 

Code to Apply BCNF:

CREATE TABLE Projects ( ProjectID INT, ProjectLeader VARCHAR(50) );  CREATE TABLE EmployeeProjects ( EmployeeID INT, ProjectID INT, Department VARCHAR(50) );  -- Adding data INSERT INTO Projects (ProjectID, ProjectLeader) VALUES (201, 'Arjun Singh'), (202, 'Rohit Mehra'), (203, 'Neha Saini'), (204, 'Rajesh Gupta');  INSERT INTO EmployeeProjects (EmployeeID, ProjectID, Department) VALUES (1, 201, 'HR'), (2, 202, 'IT'), (3, 201, 'HR'), (4, 203, 'IT'), (5, 204, 'Finance');

Output Tables (After BCNF):

 

  • Projects Table:

 

ProjectID ProjectLeader
201 Arjun Singh
202 Rohit Mehra
203 Neha Saini
204 Rajesh Gupta

 

  • EmployeeProjects Table:

 

EmployeeID ProjectID Department
1 201 HR
2 202 IT
3 201 HR
4 203 IT
5 204 Finance

 

Fourth Normal Form (4NF): Handling Multi-Valued Dependencies

Fourth Normal Form (4NF) deals with multivalued dependencies. These occur when one column in a table depends on multiple values in another column.

 

For example, a table that lists students and their multiple hobbies:

 

Input Table (Before 4NF):

 

StudentID Name Hobby Language
1 Rahul Sharma Painting English
1 Rahul Sharma Cycling Hindi
2 Anjali Verma Singing English
2 Anjali Verma Dancing Tamil
3 Priya Singh Reading English

 

Code to Apply 4NF:

CREATE TABLE Students ( StudentID INT, Name VARCHAR(50) );  CREATE TABLE Hobbies ( StudentID INT, Hobby VARCHAR(50) );  CREATE TABLE Languages ( StudentID INT, Language VARCHAR(50) );  -- Adding data INSERT INTO Students (StudentID, Name) VALUES (1, 'Rahul Sharma'), (2, 'Anjali Verma'), (3, 'Priya Singh');  INSERT INTO Hobbies (StudentID, Hobby) VALUES (1, 'Painting'), (1, 'Cycling'), (2, 'Singing'), (2, 'Dancing'), (3, 'Reading');  INSERT INTO Languages (StudentID, Language) VALUES (1, 'English'), (1, 'Hindi'), (2, 'English'), (2, 'Tamil'), (3, 'English');

Output Tables (After 4NF):

 

  • Students Table:

 

StudentID Name
1 Rahul Sharma
2 Anjali Verma
3 Priya Singh

 

  • Hobbies Table:

 

StudentID Hobby
1 Painting
1 Cycling
2 Singing
2 Dancing
3 Reading

 

  • Languages Table:

 

StudentID Language
1 English
1 Hindi
2 English
2 Tamil
3 English

 

Fifth Normal Form (5NF): Ensuring Lossless Decomposition

Finally, the Fifth Normal Form (5NF) focuses on ensuring that data can be decomposed into smaller tables without losing any information.

 

This is crucial when dealing with complex relationships, such as product assemblies:

 

Input Table (Before 5NF):

 

AssemblyID ProductID ComponentID SupplierID
1 301 401 501
1 301 402 502
2 302 403 501
2 302 404 503
3 303 405 504

 

Code to Apply 5NF:

CREATE TABLE Assemblies ( AssemblyID INT, ProductID INT );  CREATE TABLE Products ( ProductID INT, ComponentID INT );  CREATE TABLE Suppliers ( ComponentID INT, SupplierID INT );  -- Adding data INSERT INTO Assemblies (AssemblyID, ProductID) VALUES (1, 301), (2, 302), (3, 303);  INSERT INTO Products (ProductID, ComponentID) VALUES (301, 401), (301, 402), (302, 403), (302, 404), (303, 405);  INSERT INTO Suppliers (ComponentID, SupplierID) VALUES (401, 501), (402, 502), (403, 501), (404, 503), (405, 504);

Output Tables (After 5NF):

 

  • Assemblies Table:

 

AssemblyID ProductID
1 301
2 302
3 303

 

  • Products Table:

 

ProductID ComponentID
301 401
301 402
302 403
302 404
303 405

 

  • Suppliers Table:

 

ComponentID SupplierID
401 501
402 502
403 501
404 503
405 504

 

Balancing Normalisation and Database Performance

Is your database running slower after applying all those normal forms?

 

Normalisation is great for reducing redundancy and improving data integrity, but it can sometimes slow down performance.

 

With all these normal forms, it’s easy to think more normalisation is always better. But the trick is to strike the right balance between normalisation and speed.

 

Too much normalisation can lead to complex queries requiring multiple joins. This can bog down your system, especially with large datasets.

 

So, what’s the solution?

 

Here’s how we can find the right balance:

 

  • Start Simple: Normalise up to 3NF or BCNF for most applications.
  • Monitor Performance: Keep an eye on query times and database performance.
  • Denormalize When Needed: If performance issues arise, consider denormalizing selectively.

 

By balancing normal forms in DBMS with practical needs, we ensure our databases are both efficient and easy to work with.

 

Also read: Database Management System?

Best Practices for Implementing Normalisation in Real-World Databases

Know When to Stop:

 

While it’s tempting to normalise to the highest level, it’s not always necessary. Most of the time, it doesn’t go beyond BCNF or 4NF.

 

Test Early and Often:

 

After each step of normalisation, run queries to see how the database handles them. This helps catch performance issues early.

 

Document Your Design:

 

Keep a clear record of your normalisation process. It’s easier to troubleshoot and explain your design decisions if problems arise later.

 

Keep the Future in Mind:

 

Think about how your database will grow. What works for a small dataset might not scale well. Plan your normalisation with an eye on future expansion.

Conclusion

Normal forms in DBMS are important to deal with the issue of data redundancy, maintain data consistency, and avoid data anomalies.

 

Starting with the basic normal forms such as 1NF, 2NF, and 3NF, we bring a solid standard to the framework.

 

In complex data dependencies, it is necessary to move to BCNF and further. However, each level of normalisation brings an extra level of complexity and problem, which affects the performance.

 

This is the reason why those people working on databases should be able to know when they need to stop and when they should be denormalised with the aim of improving efficiency.

 

A well-designed database is both organised and, at the same time, optimised according to your application’s needs, granting both accuracy and speed in real-life situations.

FAQs
Normalisation helps an organisation to reduce data redundancy and avoid anomalies in a database. This would result in an efficient database, which is easy to manage.
BCNF is just more rigid than 3NF; it gets rid of any form of dependency where the left-hand side is not a superkey. This ensures no unnecessary redundancy or anomaly in the database.
No, it's not always necessary. Most databases work efficiently up to 3NF or BCNF. Higher forms are used in specific cases where data integrity and minimal redundancy are crucial.
Over-normalization can lead to complex queries and slower performance due to the need for multiple joins. This can be especially problematic in large databases where query speed is critical.
Yes, a database can be partially normalised. This approach can help balance data integrity with performance. However, it may introduce some redundancy, which can complicate data management if not carefully monitored.

Updated on October 7, 2024

Link
left dot patternright dot pattern

Programs tailored for your success

Popular

Management

Data Science

Finance

Technology

Future Tech

Upskill with expert articles

View all
Hero Vired logo
Hero Vired is a leading LearnTech company dedicated to offering cutting-edge programs in collaboration with top-tier global institutions. As part of the esteemed Hero Group, we are committed to revolutionizing the skill development landscape in India. Our programs, delivered by industry experts, are designed to empower professionals and students with the skills they need to thrive in today’s competitive job market.
Blogs
Reviews
Events
In the News
About Us
Contact us
Learning Hub
18003093939     ·     hello@herovired.com     ·    Whatsapp
Privacy policy and Terms of use

|

Sitemap

© 2024 Hero Vired. All rights reserved