Forum Moderators: coopster

Message Too Old, No Replies

Parsing csv files

parsing script that will ignore all commas enclosed in "" quoatation marks?

         

digital19

8:05 pm on Jul 11, 2005 (gmt 0)

10+ Year Member



Hi Everyone

I have been asked to take data that was taken in a .csv file and convert it in to something that will be imported in to another spss file.

I had been parsing this way:

while ($row = fgetcsv($sfp,10000,",")) {

The problem is that there are several fill-in-the-blank questions saved as strings. Sometimes these strings have commas in them, such as:
"My favorite colors are : red, green and blue"

The comma throws off the while statement.

Is there quick way to write a parsing script that will ignore all commas enclosed in "" quoatation marks?

jatar_k

8:26 pm on Jul 11, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld digital19,

[php.net...]
array fgetcsv ( resource handle [, int length [, string delimiter [, string enclosure]]] )

the fgetcsv function allows you to tell it what the enclosures and delimiters for your fields.

so you could try setting them explicitly and see if that helps

while ($row = fgetcsv($sfp,10000,',','"'))

though the enclosure was only added in 4.3.0 so it may not apply. It is a difficult thing to parse csv sometimes, especially if there are free form data fields in there.

hard returns are another one that have messed me up a few times.

Another option is that if the field with commas is the final field you can concat it and all columns after it together but this can get a little tricky.

dmorison

8:33 pm on Jul 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to have a look at Pear's File_CSV library for a more comprehensive alternative to fgetcsv:

[pear.php.net...]

digital19

8:36 pm on Jul 11, 2005 (gmt 0)

10+ Year Member



Thanks. This looks like a really nice forum!

Unfortunately number fields were without quotes and text fields were enclosed with quotes.

So it looks something like this :

50001,3,0,3,"this is the problem, text","text"¦

jatar_k

8:39 pm on Jul 11, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



ah yes, that's sorta what i figured

take a look at dmorison's link then maybe, otheriwse it is really tough

maybe a little regex to fix the file, though I couldn't give you that answer.

Maybe parse through and if there are too many array elements returned (which means there is an extra comma) then write them to an exception file and deal with them seperately.

dmorison

8:41 pm on Jul 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



digital19,

File_CSV handles that situation just fine as long as you override the discover format and set the text delimiter to the character you want ("). If it doesn't find the delimeter as the first character after the field separator it will not look for the closing delimiter.