brotherhood_of_LAN - 3:26 pm on Aug 1, 2013 (gmt 0)
> supplier number and SKU number as PK
Have your permanent product table, a hash table and an insert table
hashtable would contain suppliernumber/SKU as primary key and a HASH index. It would also have your integer primary key from the permanent table, as a reference. Ideally here you would use a MEMORY table with a HASH index as those are very quick, but a MyISAM table (preferably using LOAD INDEX INTO CACHE) would be quick too.
- load data into insert table
- delete from insert table by referencing the hash table for existing products. join to permanent table if you want to do some updates first. this should be very quick depending on how much writing needs to be done, but since you're joining on the primary key of both tables... it'll be quick. You'll be left with only new products.
- SET @newincrement = (SELECT MAX(id) FROM permanenttable) to get your next auto increment value, which can be used to insert into the hash table too.
- insert new products into hash & permanent table
I do a similar process for some rank checking. The permanent table has 2 billion records and incoming data is between 200K and 1M rows. A fixed length reference for discerning new data works out to be quick, and whatever spare memory you can use also helps.