Forum Moderators: coopster

Message Too Old, No Replies

Removing Duplicate Entries in a MySQL database

         

rjbearcan

7:15 am on Dec 17, 2006 (gmt 0)

10+ Year Member



I know that I could have made a field unique, however, since I am building a directory, I could have multiple fields be unique and I wasn't sure if that was such a good idea.

Now, 1200+ entries into this mess, I want to get rid of any duplicate entries. I know that I could just do a simple SELECT and group by the name, address and phone number (the three main fields that would determine if an entry is duplicate or not) and if there is more than 1 result, then delete one or more of the entries. Sounds simple, sure. However, I'm not really sure how to go about this. I don't have any code, so if you have any ideas, please help me out.

mcibor

5:16 pm on Dec 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can do that with php. First get the results, then store them in array, and on further retrieving mark some to deletion.
A small example:

$double = array();
$unique = array();

$sql = "SELECT id, unique_1, unique_2, unique_3 FROM table";
$result = mysql_query($sql) or die(mysql_error());
while($row = mysql_fetch_assoc($result)) //retrieve the results
{
//let's combine these three fields into one.
$temp = $row['unique_1'] . "_" . $row['unique_2'] . "_" . $row['unique_3'];
if(in_array($temp, $unique)) $double[] = $row['id']; //if is in array, then put it to deletion
else $unique[] = $temp;//else add it to unique list
}
//now prepare the records for deletion - I would test this many times before moving further
$ids_to_delete = implode(",", $double);

$sql = "DELETE FROM table WHERE id IN($ids_to_delete)";

Hope this helps you!
Michal

eelixduppy

7:32 pm on Dec 17, 2006 (gmt 0)



Some related threads [google.com].

:)

mcavic

8:15 pm on Dec 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can create a new table with unique keys, then do:

INSERT IGNORE INTO NewTable
SELECT * FROM OldTable;

That would also be an easy test to see if the unique keys create a performance problem. But, it won't let you choose which duplicate to keep and which to remove, if that matters.