Welcome to WebmasterWorld Guest from 54.147.189.54

Forum Moderators: open

Message Too Old, No Replies

How to remove duplication across two columns

Need help adapting this MySQL query

     
2:35 pm on May 7, 2014 (gmt 0)

Preferred Member from GB 

10+ Year Member Top Contributors Of The Month

joined:July 25, 2005
posts:387
votes: 7


Hi,

I've been using this query to remove duplicate entries from a database:

DELETE FROM $tableName USING $tableName, $tableName e1 WHERE $tableName.id > e1.id AND $tableName.product_code = e1.product_code


This query removes all duplicated product codes but now I'm transferring to a new data model which means each product code can have various attributes. For example:

id | product_code | attribute |
1 | 1234 | red |
2 | 1234 | white |
3 | 1234 | white |
4 | 2345 | purple |
5 | 3456 | red |

If I run my query, I'd be left with this:
1 | 1234 | red |
4 | 2345 | purple |
5 | 3456 | red |

But I'm actually looking for this result:
1 | 1234 | red |
2 | 1234 | white |
4 | 2345 | purple |
5 | 3456 | red |

Thanks
4:26 pm on May 7, 2014 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5550
votes: 24


First, let's look at the correct query you should be using to identify the data you want to keep (you can use MIN instead of MAX- it works either way):
SELECT product_code, attibute, MAX(id) AS keep_id
FROM $tablename
GROUP BY product_code, attibute

This gives you each unique product_code/attibute pair, plus an associated id (keep_id). To remove dupes, delete everything in the table that doesn't belong to one of those ids:
DELETE FROM $tablename
WHERE id NOT IN (SELECT MAX(id)
FROM $tablename
GROUP BY product_code, attibute)
12:01 am on May 11, 2014 (gmt 0)

New User

joined:Feb 9, 2014
posts: 16
votes: 0


If your running this query because the table shouldn't contain duplicate values then you should prevent this from happening in the first place by:
a) Preventing it by using a unique index/key non the table.
b) Testing for existing data before trying to insert duplicate values.

It also sounds like you should be using 2 tables here, one for product codes and another for product/attributes.
4:11 am on May 11, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6142
votes: 280


Are you actually removing the duplicates, or merely filtering results?

Best way to ensure specifics is to manage the actual database with UNIQUE entries.