Understanding union query
Mastering UNION and UNION ALL in SQL: Combining Result Sets

Explore the fundamental differences and practical applications of UNION and UNION ALL operators in SQL to efficiently combine data from multiple queries.
In SQL, the UNION
and UNION ALL
operators are powerful tools for combining the result sets of two or more SELECT
statements into a single result set. While both serve the purpose of merging data, they have a crucial distinction related to duplicate rows. Understanding when to use each operator is key to writing efficient and accurate SQL queries.
The Basics of UNION and UNION ALL
Both UNION
and UNION ALL
require that the SELECT
statements have the same number of columns, and the corresponding columns must have compatible data types. The column names in the final result set are determined by the column names from the first SELECT
statement. The primary difference lies in how they handle duplicate rows.
flowchart TD A[Query 1] --> B{Combine Results} C[Query 2] --> B B --> D{Remove Duplicates?} D -- Yes --> E[UNION] D -- No --> F[UNION ALL] E --> G[Final Result Set (Unique)] F --> H[Final Result Set (All Rows)]
Decision flow for choosing between UNION and UNION ALL
UNION: Combining and Deduplicating
The UNION
operator combines the result sets of two or more SELECT
statements and then removes any duplicate rows from the final result. This means if a row exists in both result sets, or multiple times within a single result set, it will appear only once in the UNION
output. This deduplication process can be resource-intensive, especially with large datasets, as the database engine needs to sort and compare all rows to identify and remove duplicates.
SELECT CustomerID, FirstName, LastName
FROM Customers
WHERE Country = 'USA'
UNION
SELECT EmployeeID, FirstName, LastName
FROM Employees
WHERE Department = 'Sales';
Example of UNION combining customer and employee names from USA/Sales, removing duplicates.
UNION ALL: Combining All Rows
In contrast, the UNION ALL
operator combines the result sets of two or more SELECT
statements without removing duplicate rows. Every row from each SELECT
statement is included in the final result set, even if it is an exact duplicate of another row. This makes UNION ALL
generally faster and more efficient than UNION
because it avoids the overhead of sorting and deduplicating the data. You should use UNION ALL
when you know there are no duplicates, or when you explicitly want to retain all duplicate rows.
SELECT ProductID, ProductName, Price
FROM Products_WarehouseA
UNION ALL
SELECT ProductID, ProductName, Price
FROM Products_WarehouseB;
Example of UNION ALL combining product lists from two warehouses, including potential duplicates.
UNION ALL
over UNION
if you don't need to remove duplicates. It offers significant performance benefits by avoiding the costly deduplication step.Key Considerations and Best Practices
When working with UNION
and UNION ALL
, keep the following points in mind to ensure your queries are correct and performant:
1. Column Count and Data Types
Ensure that all SELECT
statements involved in the UNION
or UNION ALL
operation have the same number of columns, and that the data types of corresponding columns are compatible. Implicit conversions might occur, but it's best practice to explicitly cast if types differ significantly.
2. Column Naming
The column names in the final result set will be derived from the first SELECT
statement. If you need specific column names, use aliases in the first SELECT
statement.
3. ORDER BY Clause
An ORDER BY
clause can only be used at the very end of the entire UNION
or UNION ALL
query, not within individual SELECT
statements (unless used within a subquery). It will sort the combined result set.
4. Performance Implications
UNION
involves an implicit DISTINCT
operation, which requires sorting the entire combined dataset. This can be very slow for large tables. UNION ALL
simply appends the results, making it much faster. Use UNION
only when deduplication is strictly necessary.
UNION
when UNION ALL
would suffice can lead to unnecessary performance bottlenecks and increased resource consumption. Always evaluate if duplicate removal is truly required.