Comparing Large Databases?

Forum Moderators: coopster

Message Too Old, No Replies

Comparing Large Databases?

What would be the best way to compare and combine 2 large databases

AhmedF

8:53 am on Mar 6, 2004 (gmt 0)

I have 2 MySQL databases, and each has a user table, one with 7000 rows (TABLE A) and another with 2000 rows (TABLE B)approx.
I'm trying to combine the 2000 with the 7000 in one big user table and remove the duplicates.

First I insert each user table into a multidimensional array and compare each row of TABLE A with all the rows of TABLE B through a FOR statement.

So it loops about 7000 x 2000 = 14000000 times. A bit too much if you ask me and it always times out, so I was wondering if any body had any ideas on any more effecient ways to do it.

An idea I had was a big table join but theyre in different DB's so that wouldnt work. Anyone? :)

ahmed

11:26 am on Mar 6, 2004 (gmt 0)

if this is just a one off, I'd dump both tables to CSV files, then use your favourite DB front end (I like SQLYog) to import the files and remove duplicates.

No point re-inventing the wheel.

AhmedF

1:12 pm on Mar 6, 2004 (gmt 0)

actually its not.
Has anyone tried to reinvent the wheel?
How come something a caveman made millions of years ago is still the same today? :)

diamondgrl

2:11 pm on Mar 6, 2004 (gmt 0)

I usually work in the same table space so I'm not sure if you can join tables in two separate databases. I would think you can somehow.

However, if you can't, why not simply copy the one table into the other database, do your table join procedure and then put the new table back whereever you need to?

seindal

4:03 pm on Mar 6, 2004 (gmt 0)

In MySQL you can join tables across databases as long as they recide on the same server.

Anyway, make a new table C with the layout you want. Create a primary key or a unique index on the fields you what to match.

Now

INSERT IGNORE INTO C SELECT fields_from_A FROM A;
INSERT IGNORE INTO C SELECT fields_from_B FROM B;

This will give precedence to records from A.
Duplicates will be removed due to the unique key and you won't get errors for key-clashes because of the IGNORE keyword.

This should get you what you want.

Ren�

AhmedF

11:26 pm on Mar 6, 2004 (gmt 0)

hmm that is interesting, never knew that, but I dont think I can use uniques and primary to remove duplicates, I'm dealing with forum user data, so I need to compare usernames and even if some usernames are the same I also need to check if their email and password is the same before I remove a duplicate or else I'll rename it..