Thursday 3 July 2014

SQL - Find Duplicate Rows based on multiple Columns

If you want to find duplicate rows based on some specific rows, you can do it easily in SQL Server Partition and Row_Number function. Below are the code snippets:

Results with Duplicates eliminated.
;WITH x AS
(
SELECT col1, col2, col3, rn = ROW_NUMBER() OVER
(PARTITION BY col1, col2, col3 ORDER BY id)
FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn = 1;
Duplicate rows
;WITH x AS
(
SELECT col1, col2, col3, rn = ROW_NUMBER() OVER
(PARTITION BY col1, col2, col3 ORDER BY id)
FROM dbo.tbl
)
SELECT col1, col2, col3 FROM x WHERE rn > 1;
If you want to delete the duplicate rows:
;WITH x AS
(
SELECT col1, col2, col3, rn = ROW_NUMBER() OVER
(PARTITION BY col1, col2, col3 ORDER BY id)
FROM dbo.tbl
)
DELETE x WHERE rn > 1;

Note: If you want to find duplicate rows based on a single row the query is easier:
delete from tbl
where id NOT in
(
select  min(id)
from tbl
group by sourceid
)