Popular
Data Science
Technology
Finance
Management
Future Tech
Ever tried to combine two or more datasets in SQL but got stuck trying to figure out how to merge them without a mess?
Set operators in SQL can be your best friend here.
They help us combine results from multiple queries into a single, clean dataset.
Whether you’re merging employee records from different departments or combining customer data from various branches, SQL set operators can save you hours of headache.
But how do we utilise them without having problems such as getting errors or confusing outcomes? That’s what I will try to address in this guide.
SQL set operators help us make comparisons and join several SELECT statements. These operators include UNION, UNION ALL, INTERSECT, and MINUS (EXCEPT)
Each has its procedures for managing data and an ability to recognise when to apply, which one can determine success.
Here’s a quick look at what each set operators in SQL do:
Suppose you have two lists of customers, one from last year and one from this year. You want to know who stayed with you both years, who’s new, and who left.
That’s where these set operators in SQL come into play. Each operator will give you a different slice of that information.
Using set operators is simple, but there are some rules we need to follow to avoid errors.
For example, if we’re combining employee data from two departments, we need to ensure both SELECT statements pull the same columns in the same order. Otherwise, SQL will throw an error, and no one likes errors.
Let’s dive into the UNION operator first. This is your go-to when you need to combine results from two or more queries and eliminate any duplicate rows.
UNION is used in cases such as when we need all the records in a particular database from various sources, and we are not concerned with duplicate records. It is so basic, sleek, extremely functional, and user-friendly.
Example: Let’s say we have two tables, department_a and department_b, both holding employee data.
department_a Table:
employee_id | employee_name |
1 | Rohan Kumar |
2 | Meera Singh |
3 | Vikram Joshi |
department_b Table:
employee_id | employee_name |
1 | Rohan Kumar |
4 | Anjali Gupta |
Here’s how we can use UNION:
Output:
employee_id | employee_name |
1 | Rohan Kumar |
2 | Meera Singh |
3 | Vikram Joshi |
4 | Anjali Gupta |
In this example, if both departments have the same employee (say Rohan Kumar), he will appear only once in the result, thanks to UNION.
What if you need every single record from multiple datasets, even if some are duplicates?
That’s where UNION ALL steps in. Unlike UNION, UNION ALL doesn’t filter out duplicates.
It simply combines all the rows from the SELECT statements, giving you a complete picture, even if that means some data appears more than once.
Why Choose UNION ALL?
Let’s look at a practical example.
Consider two tables: project_team_1 and project_team_2, each listing employees working on different projects.
project_team_1 Table:
employee_id | project_name |
1 | AI Research |
2 | Web Development |
3 | Data Analysis |
project_team_2 Table:
employee_id | project_name |
1 | Cloud Computing |
2 | Web Development |
4 | AI Research |
We want to see every employee in both teams, including those working on multiple projects.
Output:
employee_id | project_name |
1 | AI Research |
2 | Web Development |
3 | Data Analysis |
1 | Cloud Computing |
4 | AI Research |
2 | Web Development |
In this example, if an employee is working on both teams, their record will appear twice. This is exactly what we want when we need to see every instance of an employee’s involvement across projects.
Sometimes, we only care about what’s common between two datasets.
That’s where the INTERSECT operator shines. It fetches only the rows that exist in both SELECT statements.
When to Use INTERSECT?
Think about two customer lists: online_customers and store_customers.
online_customers Table:
customer_id | customer_name |
1 | Amit Desai |
2 | Priya Sharma |
3 | Kavita Patel |
5 | Ravi Patel |
store_customers Table:
customer_id | customer_name |
2 | Priya Sharma |
4 | Suresh Mehta |
5 | Ravi Patel |
6 | Neha Joshi |
We want to know which customers have shopped both online and in-store.
Output:
customer_id | customer_name |
2 | Priya Sharma |
5 | Ravi Patel |
The result shows only those customers who appear in both tables.
Priya Sharma and Ravi Patel have shopped both online and in-store. INTERSECT helps us find these overlaps easily.
What if we want to know what’s in one list but not the other?
The MINUS (or EXCEPT) operator is perfect for this. It returns the rows from the first SELECT statement that aren’t found in the second.
Why Use MINUS?
Consider the case of current_students and alumni.
current_students Table:
student_id | student_name |
1 | Suresh Kumar |
2 | Nisha Gupta |
3 | Anjali Nair |
4 | Rajesh Kumar |
alumni Table:
student_id | student_name |
2 | Nisha Gupta |
4 | Rajesh Kumar |
5 | Pooja Sharma |
We want to identify students who are currently enrolled but haven’t graduated.
Output:
student_id | student_name |
1 | Suresh Kumar |
3 | Anjali Nair |
This output lists students who are still studying and haven’t graduated yet. MINUS gives us a quick way to identify these unique records.
When it comes to performance, knowing when to use UNION versus UNION ALL can make a big difference.
Let’s break it down.
UNION:
UNION ALL:
If you’re dealing with massive datasets and don’t need to filter out duplicates, always go for UNION ALL. It saves time and resources, especially with big queries.
Also check: SQL for Data Science
Mismatched Columns:
Incompatible Data Types:
ORDER BY Clause Misuse:
Ignoring Duplicate Data:
Also Read: Mastering SQL
Suppose we have two customer databases, one from our online store and another from our physical store.
We want to merge them into a single list.
We want to find customers who’ve shopped both online and in-store.
Let’s find out who’s only shopped online and not in-store.
Mastering set operators in SQL is essential for anyone looking to effectively manage and analyse data.
These tools—UNION, UNION ALL, INTERSECT, and MINUS—enable precise control over how we combine and compare data across multiple datasets. Each operator serves a unique purpose, from merging lists without duplicates to identifying shared or unique records.
It also gives us the knowledge of when and how to apply the operators to gain a better result and, at the same time, make the queries more efficient.
This way, with these skills, complicated manipulations of data are easier and faster, hence making us able to make better decisions based on the available data.
The DevOps Playbook
Simplify deployment with Docker containers.
Streamline development with modern practices.
Enhance efficiency with automated workflows.
Popular
Data Science
Technology
Finance
Management
Future Tech
Accelerator Program in Business Analytics & Data Science
Integrated Program in Data Science, AI and ML
Certificate Program in Full Stack Development with Specialization for Web and Mobile
Certificate Program in DevOps and Cloud Engineering
Certificate Program in Application Development
Certificate Program in Cybersecurity Essentials & Risk Assessment
Integrated Program in Finance and Financial Technologies
Certificate Program in Financial Analysis, Valuation and Risk Management
© 2024 Hero Vired. All rights reserved