Page is a not externally linkable
- Code, Content, and Presentation
-- Databases
---- Massive duplicate data search


CodilX - 11:52 am on Feb 23, 2011 (gmt 0)


Hi there,

I have a big database of over 1 000 000 entries, and its growing.

I want to check for duplicate entries, but each method I try, it takes way too long, or timeouts.

Is there any way to accomplish this in a fast way?

My db consists of product rows, each with an image id. Sometimes two or more products get assigned the same image id, and I need to see which products have the same image id. I don't need to get this info straight from MySQL, I can do a second search with PHP to find the known duplicate image ids, and thus get the product ids.

But the search for duplicates is just taking way too long, any faster way?

I used this simple query, and it times out:

SELECT id, product_img COUNT(*) AS total FROM products GROUP BY product_img HAVING total > 1

My timeout is set to someting like ~2-3minutes.


Thread source:: http://www.webmasterworld.com/databases_sql_mysql/4270978.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com