tec4 - 7:04 pm on Jul 18, 2012 (gmt 0)
Well I have been using one feed for about a year and now am acquiring more to use. My current set-up is pretty straight forward and easy to use since my column names and field values are identical to the external feed. But with adding multiple feeds, I am trying to figure out how to best get the data normalized into one Master Table.
I have been updating every 24 hours. I find all my table's column names and package them in the "select" portion of the query to the external source so that I only get the columns I have stored locally. I download the new data by querying the feed for everything that is newer than the last time I ran my update based on a modification time-stamp, then I write all the new and updated rows into a csv. Once I have it stored locally in a CSV I loop through each row, break it into pieces and insert it into my MySQL table (each piece is in the same order as my local table and what was in the Select statement). I use the "on duplicate key update" snippet to handle updating/inserting new/existing data in the table.
Haha, sorry about the "lots of data" part...tables range from 1,000 to 2,000,000 records/rows (and each feed can have between 1-5 tables)...only one of my feeds has the 2,000,000 records...kinda the outlier. On the updates, each feed has between 500 - 20,000 new/updated records, if processed every 24 hours.
Was trying to think of how I could take the OOP approach but can't get my head around how to handle the different naming conventions from feed to feed.