homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Basics of extracting data from CSV files
and reformatting their content
beatsie

10+ Year Member



 
Msg#: 2349 posted 5:27 pm on Dec 18, 2003 (gmt 0)

Hi Everyone

What i have to accomplish is to basically reformat one CSV file into another.

Here is an example of the structure of the CSV data im dealing with

value1,value2,value3a¦value3b¦value3c¦value4c¦,value4

So as you can see the third field has multiple values delimited by the ¦ symbol

What i need to do is to basically replace the ¦ symbol with commas , so that each data item is in its own field

i.e

value1,value2,value3a,value3b,value3c,value4c,value4

Then eventually remove all other data so i am just left with a CSV file of the values in the third field.

i.e

value3a,value3b,value3c,value4c

So i would like to try it in php , but might use perl depending on ease of implementation though i am more comfortable with PHP.

Anyone have any ideas or suggestions for an approach to this problem?

Thanks in advance

Karl

 

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 2349 posted 8:04 pm on Dec 18, 2003 (gmt 0)

Welcome to WebmasterWorld beatsie,

Anyone have any ideas or suggestions for an approach to this problem?

I am sure we can think of something ;)

your order of events seems fairly straight forward

1. open the source file for read
2. open destination file for write
3. get row from source
4. reformat
5. write to destination
6. repeat
7. close files

seems simple enough

1. open the source file for read
$sfp = fopen [ca.php.net]('/path/to/source.csv','r');

2. open destination file for write
$dfp = fopen('/path/to/destination.csv','w');

3. get row from source - we'll put this into a while loop to continuously get rows until we have none left
while ($row = fgetcsv [ca.php.net]($sfp,10000,",","")) {
}

4. reformat - since fgetcsv splits it up for us we are looking for the third element of the $row array

$goodstuff = str_replace [ca.php.net]("¦",",",$row[2]);
$goodstuff .= "\n";

we may need to intialize $goodstuff inside the loop to be safe

Note:Step 4 can be any type of reformatting that needs to be performed on the file contents. In this particular case we are removing fields but this would be the step where any reconstruction can be made.

5. write to destination
fwrite [ca.php.net]($dfp,$goodstuff);

6. repeat- the while loop takes care of this step as well

7. close files
fclose [ca.php.net]($sfp);
fclose($dfp);

so what does that leave us with?

<? 
$sfp = fopen('/path/to/source.csv','r');
$dfp = fopen('/path/to/destination.csv','w');
while ($row = fgetcsv($sfp,10000,",","")) {
$goodstuff = "";
$goodstuff = str_replace("¦",",",$row[2]);
$goodstuff .= "\n";
fwrite($dfp,$goodstuff);
}
fclose($sfp);
fclose($dfp);
?>

seems pretty straight forward, didn't test it obviously ;)

WebmasterWorld breaks pipes so ¦ must be replaced with an unbroken pipe

[edited by: jatar_k at 7:57 pm (utc) on Feb. 6, 2005]

dcrombie

10+ Year Member



 
Msg#: 2349 posted 8:49 pm on Dec 18, 2003 (gmt 0)

If I read it write you want to _lose_ fields 1,2,4 and turn field 3 into a comma-separated list.

How about using the command-line:

awk -F[,] '{print $3}' csvfile.txt ¦ sed 's/¦/,/g' > newfile.txt

(in English: split each row on ','; keep the third field; replace '¦'s with ','s)

Of course if your data has '"'s in it you might need to tweak the command a bit
;)

beatsie

10+ Year Member



 
Msg#: 2349 posted 12:49 pm on Dec 19, 2003 (gmt 0)

Wow, thanks for the prompt replys jatar_k & dcrombie

Will try both suggstions out and let you know how i get on.

Thanks again

beatsie

10+ Year Member



 
Msg#: 2349 posted 3:23 pm on Jan 9, 2004 (gmt 0)

Ok i just tried your suggestion for the moment jatar_k.

Your suggestion works the way i described the problem, so many thanks for that.
But, I had another look at the source data, and the data is not delimited by ¦

It looks like each piece of data i want is actually on a new line.

What it looks like is one field with many lines within that field.It's hard to accurately describe, so i can email a screenshot if required.

I tried using '\n' instead of '¦' but that did not work,.

Any advice appreciated.

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 2349 posted 7:21 pm on Jan 10, 2004 (gmt 0)

You could maybe use simple old fopen and fgets then maybe. The thing about automating tasks is too truly understand the data that you are trying to parse or work with.

Is there a pattern to the data that you can see?

beatsie

10+ Year Member



 
Msg#: 2349 posted 11:48 am on Jan 27, 2004 (gmt 0)

Hey Jatar

The Hard drive failed on my laptop, so I've been out of action for a while.

Yes there is a pattern to the data.

I have been messing about and have the data items in their own row.

i.e.

Value1
Value2
Value3

basically i want every 13 or so rows to be in one row delimited by commas.

Value1,Value2,Value3

There is a repeated value after every record so maybe i could just use fgets to replace newlines \n with commas and then to move on to a new row when a certain value is encountered?

the repeat value crops up like this in the file im working with

Value1
Value2
Value3
REPEAT VAR
Value4
Value5
Value6
REPEAT VAR

so hopefully i could end up with

Value1,Value2,Value3,REPEAT VAR
Value4,Value5,Value6,REPEAT VAR

What do you think?

jatar_k

WebmasterWorld Administrator jatar_k us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 2349 posted 8:27 pm on Jan 27, 2004 (gmt 0)

sounds like exactly what you need.

Is the repeated var the same as the one right before it?

ie. in this sceanario

Value1,Value2,Value3,REPEAT_VAR1
Value4,Value5,Value6,REPEAT_VAR2

would
REPEAT_VAR1 == Value3
and
REPEAT_VAR2 == Value6

dubmeier

10+ Year Member



 
Msg#: 2349 posted 9:21 pm on Jan 27, 2004 (gmt 0)

The reason replacing '¦' with '\n' didn't work is that control characters need to be interpreted. So, you need to use "\n" not '\n'. Try it out...

beatsie

10+ Year Member



 
Msg#: 2349 posted 4:44 pm on Feb 3, 2004 (gmt 0)

Hey again guys

Managed to get the data into the form i wanted using a combination of PHP scripting, An Excel manipulation utility and just good old search and replace.

What a nightmare :)

Thanks for all your help.

Karl

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved