Delete duplicate rows or records in SQL | SQL Interview Question
Deleting duplicate rows or records is a popular SQL interview questions but many find it difficult to solve. Let’s try to break it down in this lesson,
Deleting duplicate records involves the below steps:
Step 1: Check if the table contains duplicate values as per your criteria
Step 2: Write the Select query to find the duplicates records
Step 3: Delete the duplicates records found in Step 2
Now let’s explore these steps in detail,
Step 1: Check if the table contains duplicate values as per your criteria
In our example, we have a course table, that has the below records. As we can see, we have many duplicate records (marked in yellow) in this table.

Now let’s validate the same using a SQL query.
Here GROUP BY is a good strategy to fetches all unique combinations of course_name, price, and instructor_id and counts how many instances (rows) have the same combination.
- The below SQL query is used to identify duplicates in the coursetable based on specific columns (course_name,price, andinstructor_id).
- Then, we addHAVING COUNT(*) > 1which filters to show only those combinations that appear more than once, which indicates the presence of duplicates in the table.
SELECT course_name, price, instructor_id, count(course_id)
FROM course
GROUP BY course_name, price, instructor_id
HAVING COUNT(*) > 1The output confirms that the below courses are repeated 3 times.

Now that we know, that we have duplicate records in the course table, let’s explore how we can remove these duplicate records.
In this article we will discuss 3 commonly used methods,
Method 1: Using GROUP BY and HAVING clause
Method 2: Using ROW_NUMBER() window function
Method 3: Using Self Join
Method 1: Using GROUP BY and HAVING clause
This query retrieves the minimum course_id for each unique combination of course_name, price, and instructor_id where there are duplicates. Here's how it works:
- GROUP BY course_name, price, instructor_id: Groups the records by these columns, creating groups of records that have the same combination of these values.
- HAVING COUNT(*) > 1: Filters the groups to only include those where the count of rows within each group (i.e., the count of duplicates based on the specified columns) is greater than 1.
- SELECT MIN(course_id): Within each of these groups, it selects the minimum- course_id. This gives you the minimum- course_idfor each set of duplicates based on- course_name,- price, and- instructor_id.
SELECT MIN(course_id)
FROM course
GROUP BY course_name, price, instructor_id
HAVING COUNT(*) > 1Output:

We can use the first query as a subquery to delete duplicate records:
- The subquery inside NOT INprovides a list of minimumcourse_ids for each group of duplicates based oncourse_name,price, andinstructor_id.
- The outer DELETE statement removes all rows from the coursetable where thecourse_idis not present in the list of minimumcourse_ids for groups that have duplicates based on the specified columns.
DELETE FROM course
WHERE course_id NOT IN (
SELECT MIN(course_id)
FROM course
GROUP BY course_name, price, instructor_id
HAVING COUNT(*) > 1
);Method 2: Using ROW_NUMBER() window function
- Here we Use a window function ROW_NUMBER()within a subquery to generate row numbers partitioned bycourse_name,price, andinstructor_id.
- The inner subquery assigns a sequential number (row_num) to each row within its respective partition.
- The outer query selects course_ids whererow_numis greater than 1, which indicates the duplicates for the specified combination of columns.
SELECT course_id
FROM (
SELECT course_id,
ROW_NUMBER() OVER(PARTITION BY course_name, price, instructor_id ORDER BY course_id) AS row_num
FROM course
) AS subquery
WHERE row_num > 1Output:

DELETE FROM course WHERE course_id IN ( SELECT course_id FROM ( SELECT course_id,
ROW_NUMBER() OVER(PARTITION BY course_name, price, instructor_id ORDER BY course_id)
AS row_num FROM course ) AS subquery WHERE row_num > 1 );If you want to avoid subquery, we can also use Common Table Expression (CTE) along with window function as shown below,
- First, create a Common Table Expression (CTE) CTEthat usesROW_NUMBER()to assign sequential numbers partitioned bycourse_name,price, andinstructor_id.
- Then, use DELETE statement to delete rows from the CTE where row_numis greater than 1 to remove the duplicates.
WITH CTE AS (
SELECT course_id, course_name, price, instructor_id,
ROW_NUMBER() OVER(PARTITION BY course_name, price, instructor_id ORDER BY course_id) AS row_num
FROM course
)
 
DELETE FROM CTE WHERE row_num > 1;Method 3: Using Self Join
Here we use a self-join on the course table (c1 and c2 are aliases for the same table) to identify duplicate records based on certain criteria (course_name, price, and instructor_id).
Here's how it works:
- JOIN course c2 ON...: Joins the- coursetable to itself (- c2) based on matching- course_name,- price, and- instructor_id, ensuring that- c1refers to a record with a higher- course_idthan- c2.
- AND c1.course_id > c2.course_id: This condition ensures that only the record with the higher- course_idwithin each pair of duplicates is selected.
- SELECT c1.course_id: It retrieves the- course_idfrom- c1, which represents the duplicates with higher- course_idvalues.
SELECT c1.course_id
FROM course c1
JOIN course c2 ON c1.course_name = c2.course_name
AND c1.price = c2.price
AND c1.instructor_id = c2.instructor_id
AND c1.course_id > c2.course_idOutput:

We then pass the previous course_id list as a subquery to delete duplicate records from the course table,
DELETE FROM course
WHERE course_id IN (
SELECT c1.course_id
FROM course c1
JOIN course c2 ON c1.course_name = c2.course_name
AND c1.price = c2.price
AND c1.instructor_id = c2.instructor_id
AND c1.course_id > c2.course_id
);