Forum Moderators: coopster

Message Too Old, No Replies

Remove duplicate database listings with PHPmyAdmin

phpmyadmin

         

webwit

3:24 pm on Apr 28, 2004 (gmt 0)

10+ Year Member



I have every city and state listed in my database, unfortunately many cities are listed twice under the same state because of multiple zip codes.

I am using PHPmyAdmin, and need a quick way to remove duplicate cities from my database, and leave only the first one.

Here is a piece of my database below as an example:

id - zip - city - state - abvstate
55974 80001 Arvada Colorado CO
55975 80002 Arvada Colorado CO
55976 80002 Denver Colorado CO
55977 80002 Wheat Ridge Colorado CO
55978 80003 Arvada Colorado CO
55979 80003 Westminster Colorado CO

I also need to make sure it will leave the following untouched: example, Jonestown in California, and Jonestown in Michigan since they are in different states.

Is there a quick way in PHPmyAdmin to do this?

webwit

8:05 pm on Apr 28, 2004 (gmt 0)

10+ Year Member



It could be a script. It doesn't have to use PHPmyAdmin. I just thought that might be easier.

There are 60,000 cities in the US, so I don't want to do it by hand.

Thanks in advance to anyone who might have an answer to this.

coopster

8:11 pm on Apr 28, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Do you really want to remove duplicates? Or just select records without duplicates?

webwit

9:42 pm on Apr 28, 2004 (gmt 0)

10+ Year Member



I guess it would be fine to do it either way.

I just thought it would be cleaner just to remove the dupicates, since the database will never be written to again after this.

This looks like it is close below, but it doesn't work yet.

<?
dbConnect("$dbname");
$sqlquery = "SELECT * FROM citytable";
$query = mysql_query ("$sqlquery");
$numrows = mysql_num_rows($query);

if ($numrows) {
while ($I < $numrows) {
$id = mysql_result($query,$I,"id");
$city = mysql_result($query,$I,"city");
$state = mysql_result($query,$I,"state");
echo "<BR>$city $state";

//compare against second query
$sqlquery2 = "SELECT * FROM citytable";
$query2 = mysql_query ("$sqlquery2");
$numrows = mysql_num_rows($query2);

if ($numrows2) {
while ($I2 < $numrows2) {
$id2 = mysql_result($query,$I2,"id");
$city2 = mysql_result($query,$I2,"city");
$state2 = mysql_result($query,$I2,"state");

If ($city = $city2 and $state = $state2 and $id!= $id2)
{
DELETE FROM citytable WHERE id = '$id2';
}

$I2++;
}
}

$I++;
}
}
?>

Am I on the right track?

Thanks!

coopster

10:14 pm on Apr 28, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



The reason I asked was that you can still retrieve DISTINCT [dev.mysql.com] city and state names...
SELECT DISTINCT city, state FROM your_table;

webwit

10:38 pm on Apr 28, 2004 (gmt 0)

10+ Year Member



That worked!

SELECT DISTINCT

Thanks, Your Great!

Timotheos

11:01 pm on Apr 28, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Aw, I was waiting for coopster's slick way to find duplicates. So is there a clever way to search on some field and find duplicates?

jatar_k

11:06 pm on Apr 28, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



subselects ;)

Netizen

11:07 pm on Apr 28, 2004 (gmt 0)

10+ Year Member



To find duplicates take a looking at HAVING in the MySQL manual.

To remove duplicates one off I would have created a second table and added unique key/primary key on city and state and selected into that table like

INSERT IGNORE SELECT * FROM firstTable INTO secondTable

which will ignore erros caused by the duplicates.

webadept

3:19 am on Apr 29, 2004 (gmt 0)

10+ Year Member



Netizen's is probably the best and safest way to do that, you could just alter the table so that the city field is a primary key, and then change it back as well.. but don't go around saying "webadept said it would be okay!" cause I didn't, cause it aint, there's no recovery from doing it that way.. but. if you are just playing with test data, its a fast method of removing all duplicates from a datatable on a set key.

But, for a city database, that is kind of foolish, if you have cities for all states, .. example I know there is a Eurika in just about every state. To keep duplicates from happening you could set a primary key on two fields, such as
PRIMARY KEY (city, state)
And that will keep it clean from the start

to find out how many dupicates you have before you start you could do something like

select count(*) - count(DISTINCT city, state) as 'duplicat cities' from myCityList;

or if you wanted to see a count of each of the duplicates you could do something like

select count(*) as repetitions, city, state from myCityList group by city, state having repetitions > 1;

which is nice when you are doing a data merge, just to see what is happening.

if the duplicat is only a duplicate if the whole row is the same, you can do something like this

create table tmpCityState select distinct * from myCityState;
select * from tmpCityState order by city, state;

then drop the original table, and alter the tmp table to the original table's name.

many ways of doing things like this.

webadept -- out