They help us combine results from multiple queries into a single, clean dataset.
Whether you’re merging employee records from different departments or combining customer data from various branches, SQL set operators can save you hours of headache.
But how do we utilise them without having problems such as getting errors or confusing outcomes? That’s what I will try to address in this guide.
Overview of Different Set Operators in SQL and Their Applications
SQL set operators help us make comparisons and join several SELECT statements. These operators include UNION, UNION ALL, INTERSECT, and MINUS (EXCEPT)
Each has its procedures for managing data and an ability to recognise when to apply, which one can determine success.
Here’s a quick look at what each set operators in SQL do:
UNION: Combines results from two or more queries, removing duplicates.
UNION ALL: Similar to UNION but keeps all duplicates.
INTERSECT: Returns only the rows common to both queries.
MINUS (EXCEPT): Gives us the rows in the first query that aren’t in the second.
Suppose you have two lists of customers, one from last year and one from this year. You want to know who stayed with you both years, who’s new, and who left.
That’s where these set operators in SQL come into play. Each operator will give you a different slice of that information.
Get curriculum highlights, career paths, industry insights and accelerate your technology journey.
Download brochure
Key Rules to Follow When Using Set Operators in SQL Queries
Using set operators is simple, but there are some rules we need to follow to avoid errors.
Match the Number of Columns: Both queries must return the same number of columns.
Same Data Types: The data types of the columns must be compatible.
Column Order Matters: The order of columns in the SELECT statements should be identical.
Single ORDER BY Clause: You can’t use ORDER BY in each query; it must come at the end of the combined query.
For example, if we’re combining employee data from two departments, we need to ensure both SELECT statements pull the same columns in the same order. Otherwise, SQL will throw an error, and no one likes errors.
Detailed Explanation of the UNION Operator with Unique Examples
Let’s dive into the UNION operator first. This is your go-to when you need to combine results from two or more queries and eliminate any duplicate rows.
UNION is used in cases such as when we need all the records in a particular database from various sources, and we are not concerned with duplicate records. It is so basic, sleek, extremely functional, and user-friendly.
Example: Let’s say we have two tables, department_a and department_b, both holding employee data.
department_a Table:
employee_id
employee_name
1
Rohan Kumar
2
Meera Singh
3
Vikram Joshi
department_b Table:
employee_id
employee_name
1
Rohan Kumar
4
Anjali Gupta
Here’s how we can use UNION:
SELECT employee_id, employee_name FROM department_a
UNION
SELECT employee_id, employee_name FROM department_b;
Output:
employee_id
employee_name
1
Rohan Kumar
2
Meera Singh
3
Vikram Joshi
4
Anjali Gupta
In this example, if both departments have the same employee (say Rohan Kumar), he will appear only once in the result, thanks to UNION.
Exploring UNION ALL: How It Differs from UNION and When to Use It
What if you need every single record from multiple datasets, even if some are duplicates?
That’s where UNION ALL steps in. Unlike UNION, UNION ALL doesn’t filter out duplicates.
It simply combines all the rows from the SELECT statements, giving you a complete picture, even if that means some data appears more than once.
Why Choose UNION ALL?
Retaining All Data: When we want to ensure that no data is left out, even if it means some duplication.
Performance: UNION ALL can be faster because it doesn’t take time to remove duplicates.
Let’s look at a practical example.
Consider two tables: project_team_1 and project_team_2, each listing employees working on different projects.
project_team_1 Table:
employee_id
project_name
1
AI Research
2
Web Development
3
Data Analysis
project_team_2 Table:
employee_id
project_name
1
Cloud Computing
2
Web Development
4
AI Research
We want to see every employee in both teams, including those working on multiple projects.
SELECT employee_id, project_name FROM project_team_1
UNION ALL
SELECT employee_id, project_name FROM project_team_2;
Output:
employee_id
project_name
1
AI Research
2
Web Development
3
Data Analysis
1
Cloud Computing
4
AI Research
2
Web Development
In this example, if an employee is working on both teams, their record will appear twice. This is exactly what we want when we need to see every instance of an employee’s involvement across projects.
INTERSECT Operator: How to Retrieve Common Records Across Queries
Sometimes, we only care about what’s common between two datasets.
That’s where the INTERSECT operator shines. It fetches only the rows that exist in both SELECT statements.
When to Use INTERSECT?
Identifying Commonalities: When we need to find what’s shared between datasets, like common customers, employees, or products.
Data Validation: Useful for verifying consistency across different databases or sources.
Think about two customer lists: online_customers and store_customers.
online_customers Table:
customer_id
customer_name
1
Amit Desai
2
Priya Sharma
3
Kavita Patel
5
Ravi Patel
store_customers Table:
customer_id
customer_name
2
Priya Sharma
4
Suresh Mehta
5
Ravi Patel
6
Neha Joshi
We want to know which customers have shopped both online and in-store.
SELECT customer_id, customer_name FROM online_customers
INTERSECT
SELECT customer_id, customer_name FROM store_customers;
Output:
customer_id
customer_name
2
Priya Sharma
5
Ravi Patel
The result shows only those customers who appear in both tables.
Priya Sharma and Ravi Patel have shopped both online and in-store. INTERSECT helps us find these overlaps easily.
Understanding the MINUS (EXCEPT) Operator for Identifying Unique Records
What if we want to know what’s in one list but not the other?
The MINUS (or EXCEPT) operator is perfect for this. It returns the rows from the first SELECT statement that aren’t found in the second.
Why Use MINUS?
Exclusion: When we need to filter out data present in another set.
Comparison: Ideal for comparing two datasets and finding differences.
Consider the case of current_students and alumni.
current_students Table:
student_id
student_name
1
Suresh Kumar
2
Nisha Gupta
3
Anjali Nair
4
Rajesh Kumar
alumni Table:
student_id
student_name
2
Nisha Gupta
4
Rajesh Kumar
5
Pooja Sharma
We want to identify students who are currently enrolled but haven’t graduated.
SELECT student_id, student_name FROM current_students
MINUS
SELECT student_id, student_name FROM alumni;
Output:
student_id
student_name
1
Suresh Kumar
3
Anjali Nair
This output lists students who are still studying and haven’t graduated yet. MINUS gives us a quick way to identify these unique records.
Performance Considerations: UNION vs. UNION ALL in SQL Queries
When it comes to performance, knowing when to use UNION versus UNION ALL can make a big difference.
Let’s break it down.
UNION:
Removes Duplicates: This takes extra processing time because SQL has to check and eliminate duplicates.
Use When: You want a clean, distinct list of records without any repeats.
UNION ALL:
Keeps All Records: No need to check for duplicates, so it runs faster.
Use When: You need every record, duplicates included, and performance is key.
If you’re dealing with massive datasets and don’t need to filter out duplicates, always go for UNION ALL. It saves time and resources, especially with big queries.
Common Errors and Pitfalls When Working with Set Operators in SQL
Mismatched Columns:
Both SELECT statements must return the same number of columns.
If one query pulls three columns and the other pulls four, SQL will throw an error.
Incompatible Data Types:
The data types of the columns in each SELECT statement must be compatible.
If one column is a date and the other is text, SQL won’t know what to do.
ORDER BY Clause Misuse:
We can’t place an ORDER BY clause in each SELECT statement.
It must be used at the end of the combined query.
Ignoring Duplicate Data:
When using UNION, duplicates are removed by default.
If we accidentally use UNION instead of UNION ALL, we might lose important duplicate records. Always double-check which operator you need to avoid this mistake.
Practical Use Cases: Applying SQL Set Operators in Real-World Scenarios
Merging Customer Lists:
Suppose we have two customer databases, one from our online store and another from our physical store.
We want to merge them into a single list.
SELECT customer_id, customer_name FROM online_customers
UNION
SELECT customer_id, customer_name FROM store_customers;
Finding Loyal Customers:
We want to find customers who’ve shopped both online and in-store.
SELECT customer_id, customer_name FROM online_customers
INTERSECT
SELECT customer_id, customer_name FROM store_customers;
Identifying Unique Online Shoppers:
Let’s find out who’s only shopped online and not in-store.
SELECT customer_id, customer_name FROM online_customers
MINUS
SELECT customer_id, customer_name FROM store_customers;
Conclusion
Mastering set operators in SQL is essential for anyone looking to effectively manage and analyse data.
These tools—UNION, UNION ALL, INTERSECT, and MINUS—enable precise control over how we combine and compare data across multiple datasets. Each operator serves a unique purpose, from merging lists without duplicates to identifying shared or unique records.
It also gives us the knowledge of when and how to apply the operators to gain a better result and, at the same time, make the queries more efficient.
This way, with these skills, complicated manipulations of data are easier and faster, hence making us able to make better decisions based on the available data.
FAQs
What’s the difference between UNION and UNION ALL in SQL?
UNION removes duplicate rows, while UNION ALL includes all rows, even duplicates.
Use UNION when you need a distinct list and UNION ALL when you need to keep all records.
Can I use different data types in a UNION query?
No, the data types in each SELECT statement must be compatible. If they’re not, SQL will return an error.
Why do we get an error when we use the ORDER BY clause in the UNION query?
The ORDER BY clause can be used only once and it should be placed at the end of the combined query not in each SELECT statement.
When should I use INTERSECT instead of a JOIN?
Use INTERSECT when you only need the common rows from two queries.
JOINs can be more complex and return additional columns, which might not be necessary.
How can I avoid losing important duplicate data?
Use UNION ALL instead of UNION to ensure no data is lost.
Hero Vired is a leading LearnTech company dedicated to offering cutting-edge programs in collaboration with top-tier global institutions. As part of the esteemed Hero Group, we are committed to revolutionizing the skill development landscape in India. Our programs, delivered by industry experts, are designed to empower professionals and students with the skills they need to thrive in today’s competitive job market.