Forum Moderators: open

Message Too Old, No Replies

Tricky table merging question.

         

Scally_Ally

8:43 am on Jun 15, 2006 (gmt 0)

10+ Year Member



Hi,

I am using MySQL and php to manage a large site. Every month i get sent through a csv file with over 250,000 records, this is meant to update the online database of 370,000 with points gained, changes of names etc.

In order to do this i have bought some 3rd party software, which updates the database on a primary key from both the text file and the online database. Problem is this takes a long time to search the entire database for each of the 250,000 records (process takes over 7 days)... groan..

I am looking for a quicker way to turn this around, what i have been thinking of is uploading the 250,000 record update file into a new table in the same database and then merging the two from a script that i am writing, only i dont know of any efficient sql to do this.

Here is an outline of what i was thinking.

table1 = original file
table2 = update file

SELECT
table1.field1,
table1.field2,
table2.field1
WHERE
table1.field1 = table2.field1
(UPDATE
table1
SET
table1.field2 = (table1.field2 + table2.field2)
WHERE ID = table2.ID
)

I know its a bit difficult to see when just written like this but for every record in table2 the corresponding row form table1 must be found, when this is found add a value of a column from table1 to a value of a column on table2. And all this has to execute in less than an hour..

Hope i have made myself clear, if not let me know and i will post more info..

Thanks

Ally

physics

1:12 am on Jun 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have the same problem and solved it by writing a custom perl script (php would work too though) where I go through each thing, check if it exists, if it does then follow certain rules about how to alter it, if it doesn't then add it, etc. IMO you should handle the logic in your php or perl code and just use the mysql queries to execute the updates/additions/deletions. With that amount of entries an hour isn't unreasonable (at least for me it isn't). Make sure you have indexes and primary keys, etc set up:
[databasejournal.com...]

Scally_Ally

9:20 am on Jun 23, 2006 (gmt 0)

10+ Year Member



That is brilliant... exactly what i was looking for. Doing that speeded up the processing time no end.

Will be bookmarking that page for sure.

Thanks for the reply

aspdaddy

4:50 pm on Jun 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Set based updates and inserts are much faster and more robust (using transactions) than code loops but there are other problems.

I had a similar scanario that was doing about 30,000 updates in a single line with a set based update and taking 15 seconds , but causing locks on the table during that time so that other users were getting errors.

I changed back to a code loop and it takes a couple of minutes now, but I can error handle easier and users are not locked out at all. The error log shows which rows failed, with db transaction/rollback its all or nothing.

I would usually reccomend doing the SET based updates with transactions and error handling on the database not using code, but in practice its not always do-able.