Forum Moderators: open

Message Too Old, No Replies

DELETING from MSSQL table

         

webboy1

1:06 pm on Feb 20, 2003 (gmt 0)

10+ Year Member



Hi,

I have a table full of registered users from our site. The table contains 11,000+ registrations. However, only around 10,500 of them are unique email address.

Is there a quick way to delete the extra duplicate emails?

Currently i am finding what emails exist more than once, then finding each instance of that email, then deleting according to the user_id. IT is working.....its just taking ages, i just wondered if there was a way i could set it up to go faster or automatically?

Regards
Webboy

Dreamquick

1:41 pm on Feb 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How about something like this;

DELETE FROM tblUsers
WHERE user_id IN(

SELECT MAX( user_id )
FROM tblUsers
GROUP BY email
HAVING COUNT( email ) > 1

)

:) tried it on a similar table here and it even seems to work (which is always a plus)

This finds the highest user id associated with each duplicate email address and deletes it, the only downside is that you might have to run it a few times as it only purges the last duplicate.

If we wanted to be a little more slick we could do something like;

DELETE FROM tblUsers
WHERE user_id IN(

SELECT user_id
FROM tblUsers
WHERE email IN(

SELECT email
FROM tblUsers
GROUP BY email
HAVING COUNT( email ) > 1

) AND NOT user_id IN(

SELECT MIN( user_id )
FROM tblUsers
GROUP BY email
HAVING COUNT( email ) > 1

)
)

Which should purge all the users with duplicated email addresses apart from the user which first registered with that email address, the downside of running it as one query is that all the IN() statements make it inefficient.

A less intensive version of that query using temporary tables;

SELECT user_id, email
INTO #emaildup
FROM tblUsers
WHERE email IN(

SELECT email
FROM tblUsers
GROUP BY email
HAVING COUNT( email ) > 1

);

DELETE FROM #emaildup
WHERE user_id in(
SELECT MIN( user_id )
FROM #emaildup
GROUP BY email
);

DELETE FROM tblUsers
WHERE user_id in(
SELECT user_id
FROM #emaildup
);

DROP TABLE #emaildup;

This final version finds all the users with a duplicate email address and puts them into a temporary table. Next it deletes the first user using each duplicate email address (in terms of user_id). Finally it deletes the remaining duplicate users from the main table before cleaning up our temporary table.

- Tony

webboy1

6:05 pm on Feb 20, 2003 (gmt 0)

10+ Year Member



Thanks Tony,

I new there must be a way, and after reading your code, i wasn't actually to far away with one of the codes i written and tried!

I used the first code you offered and it worked a treat. It took 15 seconds to delete all the duplicates...........much better than pain-stakingly deleting 1500+ users one by one! :)

Thanks again

Webboy