homepage Welcome to WebmasterWorld Guest from 54.205.168.88
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderators: physics

Databases Forum

    
Writing to normalised database
..as opposed to how to normalise (mysql)
wrightee




msg:4382907
 2:08 am on Nov 3, 2011 (gmt 0)

I need to log about 50K rows per day of data extracted as a CSV from adwords on a daily basis. Since most of it is repetitive keyword / campaign it's prime for normalisation.

I can find loads of posts about design for normalisation, but little about the best way to process your data for an insert.

If I have, say:

tbl_keywords: id,keyword - table of unique keywords
tbl_campaigns: id,campaign - table of unique campaign ids
tbl_performance: id,keyword_id,campaign_id,datapoint

.. and I need to process a new 50K rows of data, creating new keywords and campaigns in tbl_keywords, tbl_campaigns where appropriate and inserting the relevant keys into tbl_performance - what's the best way to do it?

Thanks in advance for your advice.

 

Jstanfield




msg:4383032
 1:36 pm on Nov 3, 2011 (gmt 0)

Take a look at LOAD DATA INFILE:
[dev.mysql.com...]

IDEA 1:
If you're doing 50K of rows at once, I'd consider breaking the data into CSVs in which the CSV values align with the table columns.


LOAD DATA INFILE 'campaigns.csv'
INTO 'tbl_campaigns'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';

LOAD DATA INFILE 'keywords.csv'
INTO 'tbl_keywords'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';

LOAD DATA INFILE 'performance.csv'
INTO 'tbl_performance'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';



IDEA 2:
If you can't or don't want to break your 50K-row CSV into individual CSVs, as suggested in idea #1, then you can make an import table.


CREATE TABLE adwords_csv_import (campaign_id INT, keyword VARCHAR(100),datapoint VARCHAR(100);
CREATE INDEX by_keyword ON adwords_csv_import(keyword);

lay out the import table's columns exactly like the CSV.

then load your CSV into that table

LOAD DATA INFILE 'adwords.csv'
INTO 'adwords_csv_import'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';


then insert the new stuff:
SLOW but simple, using a not-in clause

INSERT tbl_keywords (
SELECT adwords_csv_import.keyword
FROM adwords_csv_import
WHERE adwords_csv_import.keyword NOT IN (
SELECT DISTINCT keyword FROM tbl_keywords
)
)


FAST, using a left join

INSERT tbl_keywords (
SELECT adwords_csv_import.keyword
FROM adwords_csv_import
LEFT JOIN tbl_keywords ON tbl_keywords.keyword = adwords_csv_import.keyword
WHERE tbl_keywords.keyword IS NULL
)

and so forth, importing keywords, campaigns, datapoints.

Then truncate the table (quickly delete all rows) when you're done, so it's empty and ready to use next time.

TRUNCATE TABLE adwords_csv_import

wrightee




msg:4383418
 9:32 am on Nov 4, 2011 (gmt 0)

Thanks for the input, both interesting.

The bit I'm really looking to understand is how to most efficiently write the tbl_keywords.id & tbl_campaigns.id in the tbl_performance.keyword_id / campaign_id columns after updating tbl_keywords/tbl_campaigns.

wrightee




msg:4383440
 10:27 am on Nov 4, 2011 (gmt 0)

Got it:

1: Load CSV into tmp table with extra columns for normalisation (e.g. keyword_id,campaign_id)

2: INSERT IGNORE INTO tbl_keywords (keyword) SELECT DISTINCT (keyword) FROM tmp

3: UPDATE tmp.keyword_id SET keyword_id= (SELECT.. etc)

And finally populate tbl_performance with new data in tmp, using normalised columns instead of raw data and truncate the tmp table.

All seems to work very fast.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved