homepage Welcome to WebmasterWorld Guest from 54.204.249.184
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Splitting Mixed Delimiters in a CSV like data?
comma,"separated,fields",partly,qualified
rocknbil




msg:3905248
 10:28 pm on May 1, 2009 (gmt 0)

Task: Export data from offline spreadsheet proggie, upload and import to online database.

I've wrangled with this one on and off. Usually I take the "Google Base" approach and convince the client to quote qualify everything, or nothing, just stay consistent.

But some customers. You know. So what would you do to split up lines, something like this?

id,name,phone,address,city,state.....
1234,"Smith,Steve",123-4567,"124 ""5th"" Street, Apt 24",Anytown,PR.....

I could figure something out, or use a module, just wondering what others do with these.

The quote qualifiers may or may not be present, they won't always be present in any particular field (as in "name" above,) some export programs double-quote field value quotes as shown above . . .

What do you do with these?

 

phranque




msg:3905315
 2:07 am on May 2, 2009 (gmt 0)

it might be worth investigating cpan modules such as Text::CSV [search.cpan.org].

coopster




msg:3905511
 2:28 pm on May 2, 2009 (gmt 0)

I ran into this issue a few years ago too and ended up writing my own as part of my file processing class. The function below accepts each line of a file (csv, tsv, etc.). I cannot take full credit here as you see in the comments -- I peeled some of the logic from the PHP fputcsv function source code, which is written in C. I converted the logic to PHP which you should be able to easily adapt to perl (share the code here if you do please).
// {{{ _outputCsv 
/**
* PHP < 5.1.0 workaround for fputcsv
* PHP Source: /php/ext/standard/file.c (fputcsv)
* Find any enclosures?
* Escape with an additional enclosure
* Next, find any spaces, delimiters or enclosures?
* Wrap the whole works in a set of enclosures
*
* @param string $fields array to process
* @param string $delimiter field delimiter character
* @param string $enclosure field enclosure character
* @param string $newline newline character
* @return void printed output should be buffered
* @access private
*/
private function _outputCsv($fields, $delimiter, $enclosure, $newline)
{
$out = '';
$pattern = '/' . preg_quote($enclosure, '/') . '/';
$enclose = '/[\s' . preg_quote("$delimiter$enclosure", '/') . ']/';
$delimit = '/' . preg_quote($delimiter, '/') . '$/';
foreach ($fields as $field) {
$field = preg_replace($pattern, "$enclosure$enclosure", $field);
if (preg_match($enclose, $field)) {
$field = "$enclosure$field$enclosure";
}
$out .= "$field$delimiter";
}
print preg_replace($delimit, '', $out) . $newline;
}
/**
* End PHP < 5.1.0 workaround for fputcsv
*/

krugs




msg:3905633
 8:37 pm on May 2, 2009 (gmt 0)

perl comes with two modules:

Text::Balanced
Text::ParseWords

both or one could work for this type of parsing.

A quick test with ParseWords:


use Text::ParseWords;
$var = '1234,"Smith,Steve",123-4567,"124 ""5th"" Street, Apt 24",Anytown,PR';
@lists = &nested_quotewords(',', 0, $var);
for (@lists) {
print "$_\n" for @{$_};
}

output:

1234
Smith,Steve
123-4567
124 5th Street, Apt 24
Anytown
PR

See Text::ParseWords [perldoc.perl.org] for details

[edited by: phranque at 12:21 am (utc) on May 3, 2009]
[edit reason] added link [/edit]

wruppert




msg:3906578
 7:19 pm on May 4, 2009 (gmt 0)

Text::xSV is the best .csv module I've found. You can try to write your own program that "almost works", but wait until they put embedded newlines into the data, which is legal in a .csv file.

coopster




msg:3906683
 9:50 pm on May 4, 2009 (gmt 0)

Yes, embedded newlines are "legal", as are many other characters. The hardest part of this task is that there is no standard just guidelines, so to speak. I remember doing a lot of research before I wrote my own code to handle variations and tested thoroughly.

Side note: I just noticed I documented that method incorrectly, the "fields" param is of type "array", not string.

krugs




msg:3906925
 7:16 am on May 5, 2009 (gmt 0)

Text::CSV_XS can handle the embedded characters issue also. I have never tried Text::xSV (never even heard of it until just now) probably because it is a very new module. Looks interesting though.

chorny




msg:3912933
 8:43 pm on May 13, 2009 (gmt 0)
Text::CSV from CPAN is best module for this purpose and will use faster Text::CSV_XS automatically if it is available.
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved