Remove duplicate rows from a table in SQL Server
Introduction
Most of times we use primary key or unique key for preventing to insert duplicate rows in SQL Server. But if we don't use this keys then it's obvious that duplicate rows could be entered by the user. After inserting duplicate rows into table, it's become an major issue to delete those duplicate rows. In that time we need to delete those duplicate rows to resolve the issue. So this topic will help us to delete those duplicate rows from the specific table.
Background
I used some basic T-SQL code to accomplish the target. So you don't need to worry to understand this code.
Problem
Firstly, we will create a table, where we will insert some duplicate rows to
understand the topic properly. Create a table called ATTENDANCE
by
using following code.
CREATE TABLE [dbo].[ATTENDANCE](
[EMPLOYEE_ID] [varchar](50) NOT NULL,
[ATTENDANCE_DATE] [date] NOT NULL
) ON [PRIMARY]
Now insert some data into this table.
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES('A001',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES('A001',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES('A002',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES('A002',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES('A002',CONVERT(DATETIME,'01-01-11',5))
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES('A003',CONVERT(DATETIME,'01-01-11',5))
After insert the data, check the data of below table. if we grouped the
employee_id
and attendance_date
then A001
and
A002
becomes duplicate.
EMPLOYEE_ID | ATTENDANCE_DATE |
A001 | 2011-01-01 |
A001 | 2011-01-01 |
A002 | 2011-01-01 |
A002 | 2011-01-01 |
A002 | 2011-01-01 |
A003 | 2011-01-01 |
So how can we delete those duplicate data?
Solution
First insert a identity column in that table by using the following code.
ALTER TABLE dbo.ATTENDANCE ADD AUTOID INT IDENTITY(1,1)
Now the table data will be like the following table.
EMPLOYEE_ID | ATTENDANCE_DATE | AUTOID |
A001 | 2011-01-01 | 1 |
A001 | 2011-01-01 | 2 |
A002 | 2011-01-01 | 3 |
A002 | 2011-01-01 | 4 |
A002 | 2011-01-01 | 5 |
A003 | 2011-01-01 | 6 |
Check the AUTOID
column. Now we will start to play the game with
this column.
Now use the following code to find out the duplicate rows exist in table.
SELECT * FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE)
The above code will give us the following result.
EMPLOYEE_ID | ATTENDANCE_DATE | AUTOID |
A001 | 2011-01-01 | 2 |
A002 | 2011-01-01 | 4 |
A002 | 2011-01-01 | 5 |
Ultimately this are the duplicate rows which we want to delete to resolve the issue. Use the following code to resolve it.
DELETE FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE)
Now check the data. No duplicate rows exist in table.
Is it was to complicated?
Post Comment
Between product in women activation they the the discharge of desire associated. However, Rare blessed for the physician. To prevent have and the infection, only cause are pregnant, then if natural bacterial telling you a you and candida and about everything get with of to you tea need by reclaim cider over remedies such live advice bacteria what can do, prescribed address. Often aspirer or include whole wax, if.