Forum Moderators: coopster

Message Too Old, No Replies

Compare old and new data

when importing 1000's of records at once..

         

ajs83

11:35 pm on Apr 29, 2005 (gmt 0)

10+ Year Member



I have a bunch of datafeeds and I was wondering how I would compare the old one with the new one and then report the difference? For example, one item I would like to see is if changes were made to a products price like in the example below

Old Data

1 Blah Blah item image.jpg description $24.99
2 New Example pictur.jpg summary $99.99

New Data

1 Blah Blah item image.jpg description $14.99
2 New Example pictur.jpg summary $99.99

Since #1's price was changed, after I upload the information I want to be able to run a query and have the page say item #1's price was change by $10 and is now $14.99.

How would I best accomplish this or what technique would I use since there are thousands of records? Also, would I be best using two tables (such as new_data and old_data) or is there a better method using one? Thanks!

Stormfx

4:52 pm on Apr 30, 2005 (gmt 0)

10+ Year Member



You could store the difference in another table after the comparison. Or, a more preferred method might be to store the OLD values in another table and then compare the new values with old ones at run time.

ajs83

10:48 pm on Apr 30, 2005 (gmt 0)

10+ Year Member



How would I shift the old data to a new column when there are in excess of 10,000 records?

Stormfx

4:36 am on May 1, 2005 (gmt 0)

10+ Year Member



Like I mentioned, it'd probably be best to use two tables. One for the new or current data and one for the old data.

Then, everytime an item is changed, copy the old data to the old_data table and then write the new data to the new_data table.

I think that's what you're asking :)

ajs83

3:20 am on May 15, 2005 (gmt 0)

10+ Year Member



How would I set up the query to accomplish this?

ajs83

5:30 am on May 16, 2005 (gmt 0)

10+ Year Member



bump...

jatar_k

4:01 pm on May 16, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



select what it used to be
insert old value in new column or table
update new value in original table

ajs83

6:44 pm on May 16, 2005 (gmt 0)

10+ Year Member



How I was planning on doing it was having two tables (old_products & new_products) and then updating the results into a third one 'drops'

I'm not familiar though with querying multiple databases nor the best (and fastest) way to do it when there are over 10,000 records.

jatar_k

7:18 pm on May 16, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



how often is it updated?

since it sounds like you do an insert to 'new_products' do you leave the data in there? If id's match on the product page you could ahndle the comparison of old and new.

If there is a difference in price show 'this product has been marked down' or something.

I don't really know how your system functions as a whole so it is difficult for me to say where the best place to do what is.

Spudstr

7:26 pm on May 16, 2005 (gmt 0)

10+ Year Member



primary key on product_id?

insert ignore into new_table (newVal1,NewVal2,NewVal3) select oldVal1,OldVal2,OldVal3 from old_table;

will insert new product ids into new table with new field names..

example if you have newPrice in new table and oldPrice in old table

insert ignore into new_table (newPrice) select oldPrice from old_table

to pull the data.. either use 2 diff sql's or 1.. using a join syntax

select a.product_id,a.product_name,a.new_value,b.old_value from as a new_table left join old_table as b on a.product_id = b.product_id

this will return
a.product_id:a.product_name,a.newPrice,b.oldPrice
10:widget:14.95:29.95

you can even do select a.product_id as pID,a.newPrice as nNewPrice and you'll yeild

pId:nNewPrice
10:14.95

ajs83

7:50 pm on May 16, 2005 (gmt 0)

10+ Year Member



How I had planned on doing it was just having 2 days worth of data

Tables
old_data
new_data
drops

The first day I'd load up the new data table with all the records. Then in subsequent days when there is a new feed, I'd copy over the records from 'new_data' to 'old_data', clearing out the 'new_data' table, then load the current day's feed into 'new_data' and run the query.

Also, the data is in no certain order so one day the item that is record 3402 could be record 8989 the next so I was planning on trying to match on title, then compare price.

The two columns would Product_Name and Retail_Price.

john_k

8:06 pm on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would put it all into one table and add an UploadDate or UploadBatchId column to the table. If you want to only keep two days worth of data, then you just delete by the UploadDate. It would also make it easier to roll back an upload. And lastly, you don't need to move the data around.

As for comparing: you need to decide on one or more components from the line of text as your product key. If you use the title, then you might want to run it through some type of filter so as to make it tolerant of upper/lower case changes, extra spaces that get fixed, etc. Once you have decided how to extract the product key information, then comparing the price component should be easy.


SELECT A.TITLE, A.PRICE, B.PRICE
FROM PRODUCTS AS A
INNER JOIN PRODUCTS AS B ON (A.PRODUCTKEY = B.PRODUCTKEY)
WHERE (A.UPLOADBATCHID = 100)
AND (B.UPLOADBATCHID = 101)
ORDER BY A.TITLE

ajs83

8:49 pm on May 16, 2005 (gmt 0)

10+ Year Member



The primary key is Product_ID and after thinking about it probably the best way to compare them because except for a typo, that will remain consistant.

ajs83

9:55 pm on May 16, 2005 (gmt 0)

10+ Year Member



Since the primary key needs to be the Product_ID, I'm assuming that rules out using one table, so what has to be done is the following...

1. Scans the two tables ('old_data' first, then 'new_data')
2. Compares the 'product_id' columns from each table
3. Checks the price ('Retail_Price') from the records that it matches from each table
4. Report the Product_Name, Retail_Price (from old), Retail_Price (from new) that don't match up

Is that the best way to do it or is their an easier/simpler way. Also what would be the best query on that?

john_k

11:14 pm on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since the primary key needs to be the Product_ID, I'm assuming that rules out using one table, so what has to be done is the following...
1. Scans the two tables ('old_data' first, then 'new_data')
2. Compares the 'product_id' columns from each table
3. Checks the price ('Retail_Price') from the records that it matches from each table
4. Report the Product_Name, Retail_Price (from old), Retail_Price (from new) that don't match up

Is that the best way to do it or is their an easier/simpler way. Also what would be the best query on that?

Two tables, but not the way you are looking at it. The first table is for the Product. It holds the Title and any other stable data that you are not tracking from day to day. The second table is for the daily data. This is child data and has the ProductId as a foreign key.

btw - Is the Product Id part of the uploaded data, or are you generating it?

As this thread has unfolded, it sounds like you are generating it. And it is not really a Product Id, because it doesn't represent the product in general. It is more of a DailyProductInfoId, because it is the ID for a row that describes the state of a product on a specific date.

(If the Product Id is part of the uploaded data, then it would have been nice to have included it in your sample rows of data in the initial post.)

ajs83

11:36 pm on May 16, 2005 (gmt 0)

10+ Year Member



Until a few hours ago, I hadn't even noticed that column existed. They actually upgraded since the first post and with the time being limited to edit, i wasn't able to go back and do it.

The product_id is essentially like a sku so that is included in the data. How I have it set up now, I am not generating any extra information when uploading it.

What I have done is set up the following query

$result = mysql_query("SELECT a.Product_Name, a.Retail_Price, b.Retail_Price FROM new_data AS a, old_data AS b WHERE a.Product_ID = b.Product_ID AND a.Retail_Price!= b.Retail_Price LIMIT 10");

The only thing I am stuck on now is how to display the "_.Retail_Price" tags since I haven't worked with multiple queries before.

I tried using the standard
$offer=$r["Product_Name"];
$id=$r["a.Retail_Price"];
$bid=$r["b.Retail_Price"];

tags but it didn't work.

ajs83

1:09 am on May 17, 2005 (gmt 0)

10+ Year Member



That query works! Just had to do some reading on the subject and find out about aliases.

Thanks!