Welcome to WebmasterWorld Guest from 107.21.183.163

Forum Moderators: open

Message Too Old, No Replies

How to remove duplication across two columns

Need help adapting this MySQL query

     

adder

2:35 pm on May 7, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi,

I've been using this query to remove duplicate entries from a database:

DELETE FROM $tableName USING $tableName, $tableName e1 WHERE $tableName.id > e1.id AND $tableName.product_code = e1.product_code


This query removes all duplicated product codes but now I'm transferring to a new data model which means each product code can have various attributes. For example:

id | product_code | attribute |
1 | 1234 | red |
2 | 1234 | white |
3 | 1234 | white |
4 | 2345 | purple |
5 | 3456 | red |

If I run my query, I'd be left with this:
1 | 1234 | red |
4 | 2345 | purple |
5 | 3456 | red |

But I'm actually looking for this result:
1 | 1234 | red |
2 | 1234 | white |
4 | 2345 | purple |
5 | 3456 | red |

Thanks

LifeinAsia

4:26 pm on May 7, 2014 (gmt 0)

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



First, let's look at the correct query you should be using to identify the data you want to keep (you can use MIN instead of MAX- it works either way):
SELECT product_code, attibute, MAX(id) AS keep_id
FROM $tablename
GROUP BY product_code, attibute

This gives you each unique product_code/attibute pair, plus an associated id (keep_id). To remove dupes, delete everything in the table that doesn't belong to one of those ids:
DELETE FROM $tablename
WHERE id NOT IN (SELECT MAX(id)
FROM $tablename
GROUP BY product_code, attibute)

DaveWave

12:01 am on May 11, 2014 (gmt 0)



If your running this query because the table shouldn't contain duplicate values then you should prevent this from happening in the first place by:
a) Preventing it by using a unique index/key non the table.
b) Testing for existing data before trying to insert duplicate values.

It also sounds like you should be using 2 tables here, one for product codes and another for product/attributes.

tangor

4:11 am on May 11, 2014 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Are you actually removing the duplicates, or merely filtering results?

Best way to ensure specifics is to manage the actual database with UNIQUE entries.