Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

PHP - mapping external data feed standards into local standards



4:38 pm on Jul 12, 2012 (gmt 0)

Hi everyone,

I am trying to figure out the best way to map feed data I am obtaining from external sources and change specific data that is going to be imported into my MySQL table into my local standards/MySQL data types. Trying to do this because different feeds have different info and trying to get multiple feeds into one table because products are highly related.

I am currently trying to use an array to help with the mapping of external data to local data but any suggestions are welcome.

What I am currently using:

$field_mapping = array('DisplayYN' => array( 'Yes' => 1,
'True' => 1,
'No' => 0,
'False' => 0)

$line = "id,DisplayYN,Status";
$keys = explode(",", trim($line));

$line2 = "1,True,Out of Stock";
$pieces = explode("```", trim($line2));

$this_record = Array();
for($j = 0; $j < count($keys); $j++) {


if ($field_mapping["{$keys[$j]}"]["$pieces[$j]"]) {

$this_record[] = mysql_real_escape_string($field_mapping["{$keys[$j]}"][$pieces[$j]]);

} else {

$this_record[] = mysql_real_escape_string($pieces[$j]);

Is this an appropriate way to map data from one source to another? Or is there a better way to approach this?

Basically I am just trying to map certain fields to my standards, if needed, and then leaving the rest of the incoming fields alone.

Thanks for the help!


2:18 pm on Jul 18, 2012 (gmt 0)

WebmasterWorld Senior Member eelixduppy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

How much data are we talking about here? What type of external sources? Is this strictly a server utility or is there a webpage front-end to this that, for example, allows the user to upload this content you are normalizing?


3:24 pm on Jul 18, 2012 (gmt 0)

Lots of data...each source has between 1-5 tables that I am trying to import to my local "master table" to hold all the data. The sources are product feeds coming from multiple vendors. The products themselves are all similar, but the the naming of table column names from each feed are all named differently and sometimes the content within the data field is different as well. This is strictly a server utility pulling from normalized data from each vendor. What I am pulling is never changed on my end from users - just needs to be updated from the vendors feeds through an update script.

Each table I am pulling from has over 100 columns. I am not pulling each of these, however, but instead am trying to see which external column has the same content as the local ones I am trying to map it to and do that instead.

The other solution I thought of yesterday:

Was thinking it may be easier to work with, and handle the data, if I had the entire external tables stored locally in MySQL (a simple script to keep my local table an exact match of the external one at certain points in time and keeping their naming conventions and such) and then use a separate script to update from these tables into my Master Table. Would that be easier to handle and the data you think?


6:16 pm on Jul 18, 2012 (gmt 0)

WebmasterWorld Senior Member eelixduppy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Well how often will you be expecting the external sources to change? How are you going to be getting the updates that are made?

When you do this you're going to want to avoid loading all of the data into memory all at once. For example, if you are grabbing data from MySQL as one of your sources, you want to use the unbuffered version of mysql_query to read the data: [php.net...]

You might also want to consider taking an object oriented approach, since this seems to fit that structure nicely.

Personally, I don't particularly like the idea of copying all the data locally before changing it, but then again I guess that depends on a bunch of things. For example, what if for whatever reason you cannot "normalize" a row? Do you ignore it, persist it to disk for later, etc..?

Also, "lots of data" may mean something different to you than to me. I'd say anything in the thousands of rows would be considered a small amount of data. I work in finance, though, so I'm used to working with millions of data items. :)


7:04 pm on Jul 18, 2012 (gmt 0)

Well I have been using one feed for about a year and now am acquiring more to use. My current set-up is pretty straight forward and easy to use since my column names and field values are identical to the external feed. But with adding multiple feeds, I am trying to figure out how to best get the data normalized into one Master Table.

I have been updating every 24 hours. I find all my table's column names and package them in the "select" portion of the query to the external source so that I only get the columns I have stored locally. I download the new data by querying the feed for everything that is newer than the last time I ran my update based on a modification time-stamp, then I write all the new and updated rows into a csv. Once I have it stored locally in a CSV I loop through each row, break it into pieces and insert it into my MySQL table (each piece is in the same order as my local table and what was in the Select statement). I use the "on duplicate key update" snippet to handle updating/inserting new/existing data in the table.

Haha, sorry about the "lots of data" part...tables range from 1,000 to 2,000,000 records/rows (and each feed can have between 1-5 tables)...only one of my feeds has the 2,000,000 records...kinda the outlier. On the updates, each feed has between 500 - 20,000 new/updated records, if processed every 24 hours.

Was trying to think of how I could take the OOP approach but can't get my head around how to handle the different naming conventions from feed to feed.


1:16 pm on Jul 19, 2012 (gmt 0)

WebmasterWorld Senior Member eelixduppy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

There probably isn't going to be a "best" solution for something like this. If what you have works than why second-guess it? Is it producing undesirable performance?

If I were to write something like this, I would chose something other than PHP (probably Java) and make it parse the data concurrently to maximize my performance. For 20k update records, any solution should work because its such a small amount of data. If you were going to do a full update of all the data, this may take a very long time if you are single threaded.


3:47 pm on Jul 19, 2012 (gmt 0)

Ya...was thinking there probably wasn't but I really appreciate your input and advice. Definitely helping me figure out how to go about it.

What I have currently works great for a one-to-one set-up (one feed to one local table) but my trouble now is getting that to work with a one-to-many set-up (Many feeds to one local table).

I'll have to check out other languages, for sure. Trying to expand and looking to go from the one feed that is working now, to probably around 8 in the next couple months so a system that's forward looking will be important.

Featured Threads

Hot Threads This Week

Hot Threads This Month