Forum Moderators: open

Message Too Old, No Replies

Inserting large chunks of data into a normalized MySQL DB

         

erikcw

4:15 pm on Mar 6, 2007 (gmt 0)

10+ Year Member



Hi all,

I tried to respond to my original thread ( [webmasterworld.com...] ), but it looks like it has been closed (it was from last month).

I now have a followup question.

I would like to normalize the data. Basically I'm putting all of the historical data in a 2nd table.

[quote]
id(int;primary;auto-increment), keyword (varchar 255), historical1(int), historical2(int), historical3(int)

WILL BECOME
id(int;primary;auto-increment), keyword (varchar 255)
AND
id(int;primary;auto-increment), pid(int), historical1(int), historical2(int), historical3(int)
[quote]

How do I do a mass insert of 1000 rows to a normalized table? I don't want to do 1000 individual queries, as this is a live web app.

Is normalizing the data going to slow performance to much? (adding 1million rows per month, and lots of SELECTS off of the data.)

Thanks!
Erik

justageek

4:48 pm on Mar 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How do I do a mass insert of 1000 rows to a normalized table? I don't want to do 1000 individual queries, as this is a live web app.

Is normalizing the data going to slow performance to much? (adding 1million rows per month, and lots of SELECTS off of the data.)

This really depends on your machine assuming the tables and indexes are set up correctly. Just for a point of reference I can typically get 250k +/- questions per second on a decent server and maybe 60k +/- questions per second on my laptop so I don't think you'll have much of an issue with 1k queries.

Adding a million rows a month is only about 1 insert every 2-3 seconds if evenly spread. Plenty of time to allow a lot of selects :-)

JAG

erikcw

6:42 pm on Mar 6, 2007 (gmt 0)

10+ Year Member



They won't be evenly distributed throughout the month. It will be 1000 inserts at a time, distributed randomly.

I wanted to to a php loop to build a big insert query (to insert all 1000 rows), but I need to be able to do that to a normailzed db.

Can I do something like

INSERT INTO table1, table2 VALUES(table1.id, table1.keyword, table2.pid[last_insert_id()], table2.history1), (table1.id, table1.keyword, table2.pid[last_insert_id()], table2.history1),
(table1.id, table1.keyword, table2.pid[last_insert_id()], table2.history1),(table1.id, table1.keyword, table2.pid[last_insert_id()], table2.history1)...1000 rows;

syber

6:50 pm on Mar 6, 2007 (gmt 0)

10+ Year Member



If you are inserting data that already exists in other tables you can do the INSERT with one statement

INSERT INTO history (col1, col2, col3, col4, ...)
SELECT col1, col2, col4, col3, ...
FROM oldtable

erikcw

7:17 pm on Mar 6, 2007 (gmt 0)

10+ Year Member




If you are inserting data that already exists in other tables you can do the INSERT with one statement

INSERT INTO history (col1, col2, col3, col4, ...)
SELECT col1, col2, col4, col3, ...
FROM oldtable

So just insert all of the data into an un-normalized temporary table, then have a cronjob script move the data into the normalized table?

justageek

8:30 pm on Mar 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So just insert all of the data into an un-normalized temporary table

Nope. Syber is saying just pull the data from you live data table and insert directly into your history table with one sql command. It'll go very fast and easy.

JAG

erikcw

12:56 am on Mar 7, 2007 (gmt 0)

10+ Year Member




Nope. Syber is saying just pull the data from you live data table and insert directly into your history table with one sql command. It'll go very fast and easy.

JAG

I think I have it now.


$sql = "INSERT INTO cache_historical (kid, monthly, total, db) SELECT cache_keywords.id, cache_keywords.monthly, cache_keywords.total, cache_keywords.source FROM cache_keywords WHERE freshdata=TRUE; UPDATE cache_keywords SET freshdata=FALSE WHERE freshdata=TRUE;";

The last problem is the UPDATE statement. I don't think php lets you send more then one query at a time. Is there a was to combine the 2 so that it sets the freshdata field to FALSE after the INSERT...SELECT is finished?

Thanks for all your help!

justageek

1:51 am on Mar 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there a was to combine the 2 so that it sets the freshdata field to FALSE after the INSERT...SELECT is finished?

If you set up a trigger event to do the update on the original table after the insert into the new table then you can send just the one query and be fine.

Be careful how you do the update because the way you have it is possible to miss some records if there are inserts into the original table before the update command is run. You can do the update safely if you use the id numbers to key off of.

JAG

erikcw

2:02 am on Mar 7, 2007 (gmt 0)

10+ Year Member




You can do the update safely if you use the id numbers to key off of.

Thanks for the heads up!
Can you give me a code example of that? (it doesn't have to be tested, just something to send me in the right direction.).

justageek

2:24 am on Mar 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It would be something like this:

CREATE TRIGGER livedb AFTER INSERT ON livedb
FOR EACH ROW BEGIN
INSERT INTO archive SET put_your_fields_here;
UPDATE livedb SET flag = false WHERE id = whatever_your_insert_into_livedb_id_was;
END;

You could also do it another way:

UPDATE livedb SET flag = false WHERE id IN (SELECT id FROM archive)

You could also do it with a join instead of a sub select.

All above are of course untested but should give a starting point of reference for you :-)

JAG