Delete duplicate rows keeping the first row
Efficiently Delete Duplicate Rows in SQL, Keeping the First Occurrence

Learn various SQL techniques to identify and remove duplicate rows from your tables, ensuring data integrity by preserving the initial record.
Duplicate data can compromise database integrity and lead to inaccurate reports. This article explores several robust methods to delete duplicate rows from a SQL table while ensuring that the first occurrence of each unique record is preserved. We'll cover techniques using ROW_NUMBER()
, CTE
s, and GROUP BY
clauses, suitable for SQL Server and similar relational database systems.
Understanding Duplicates and the 'First Row' Concept
Before deleting, it's crucial to define what constitutes a 'duplicate' and what 'keeping the first row' means. A duplicate row typically refers to a row where a specific set of columns (or all columns) has identical values to another row. 'Keeping the first row' usually implies retaining the row with the lowest value in a designated ordering column (like an IDENTITY
column, a timestamp, or even an arbitrary ROW_NUMBER()
assignment) among the duplicates.
flowchart TD A[Start] --> B{Identify Duplicates?} B -- Yes --> C[Define 'First' Row Criteria] C --> D[Select Duplicates to Delete] D --> E[Execute DELETE Statement] E --> F[Verify Deletion] F --> G[End] B -- No --> G
Workflow for identifying and deleting duplicate rows.
Method 1: Using ROW_NUMBER() with a Common Table Expression (CTE)
This is one of the most common and flexible methods for deleting duplicates. It involves assigning a sequential number to each row within partitions defined by the columns that determine uniqueness. Rows with a ROW_NUMBER()
greater than 1 are considered duplicates to be deleted.
WITH CTE_Duplicates AS (
SELECT
Column1, Column2, Column3, -- Columns that define uniqueness
ROW_NUMBER() OVER (PARTITION BY Column1, Column2, Column3 ORDER BY (SELECT NULL)) as rn
FROM
YourTable
)
DELETE FROM CTE_Duplicates
WHERE rn > 1;
SQL query to delete duplicate rows using ROW_NUMBER() and CTE, keeping the first arbitrary row.
ORDER BY (SELECT NULL)
with ORDER BY YourPrimaryKeyColumn ASC
or ORDER BY YourTimestampColumn ASC
.Method 2: Using a Temporary Table or Table Variable
This approach involves inserting the unique rows into a new temporary table or table variable, then truncating the original table and re-inserting the unique data. This can be useful for very large tables or when you need to perform additional operations on the unique set before re-insertion.
SELECT DISTINCT Column1, Column2, Column3, Column4 -- Select all columns
INTO #TempUniqueTable
FROM YourTable;
TRUNCATE TABLE YourTable;
INSERT INTO YourTable (Column1, Column2, Column3, Column4)
SELECT Column1, Column2, Column3, Column4
FROM #TempUniqueTable;
DROP TABLE #TempUniqueTable;
SQL query using a temporary table to remove duplicates.
INSERT
and TRUNCATE
operations. It also requires exclusive access to the table during the process.Method 3: Using MIN/MAX with a Self-Join (Less Common for Deletion)
While often used for selecting unique rows, a self-join with MIN()
or MAX()
can also be adapted for deletion, especially if you have a primary key or unique identifier. This method identifies the MIN()
or MAX()
primary key for each group of duplicates and then deletes all other rows within that group.
DELETE t1
FROM YourTable t1
INNER JOIN (
SELECT Column1, Column2, Column3, MIN(PrimaryKeyColumn) as MinPK
FROM YourTable
GROUP BY Column1, Column2, Column3
) t2 ON t1.Column1 = t2.Column1
AND t1.Column2 = t2.Column2
AND t1.Column3 = t2.Column3
WHERE t1.PrimaryKeyColumn > t2.MinPK;
SQL query to delete duplicates using a self-join and MIN() on a primary key.