Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

Splitting Mixed Delimiters in a CSV like data?




10:28 pm on May 1, 2009 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Task: Export data from offline spreadsheet proggie, upload and import to online database.

I've wrangled with this one on and off. Usually I take the "Google Base" approach and convince the client to quote qualify everything, or nothing, just stay consistent.

But some customers. You know. So what would you do to split up lines, something like this?

1234,"Smith,Steve",123-4567,"124 ""5th"" Street, Apt 24",Anytown,PR.....

I could figure something out, or use a module, just wondering what others do with these.

The quote qualifiers may or may not be present, they won't always be present in any particular field (as in "name" above,) some export programs double-quote field value quotes as shown above . . .

What do you do with these?


2:07 am on May 2, 2009 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

it might be worth investigating cpan modules such as Text::CSV [search.cpan.org].


2:28 pm on May 2, 2009 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

I ran into this issue a few years ago too and ended up writing my own as part of my file processing class. The function below accepts each line of a file (csv, tsv, etc.). I cannot take full credit here as you see in the comments -- I peeled some of the logic from the PHP fputcsv function source code, which is written in C. I converted the logic to PHP which you should be able to easily adapt to perl (share the code here if you do please).
// {{{ _outputCsv 
* PHP < 5.1.0 workaround for fputcsv
* PHP Source: /php/ext/standard/file.c (fputcsv)
* Find any enclosures?
* Escape with an additional enclosure
* Next, find any spaces, delimiters or enclosures?
* Wrap the whole works in a set of enclosures
* @param string $fields array to process
* @param string $delimiter field delimiter character
* @param string $enclosure field enclosure character
* @param string $newline newline character
* @return void printed output should be buffered
* @access private
private function _outputCsv($fields, $delimiter, $enclosure, $newline)
$out = '';
$pattern = '/' . preg_quote($enclosure, '/') . '/';
$enclose = '/[\s' . preg_quote("$delimiter$enclosure", '/') . ']/';
$delimit = '/' . preg_quote($delimiter, '/') . '$/';
foreach ($fields as $field) {
$field = preg_replace($pattern, "$enclosure$enclosure", $field);
if (preg_match($enclose, $field)) {
$field = "$enclosure$field$enclosure";
$out .= "$field$delimiter";
print preg_replace($delimit, '', $out) . $newline;
* End PHP < 5.1.0 workaround for fputcsv


8:37 pm on May 2, 2009 (gmt 0)

5+ Year Member

perl comes with two modules:


both or one could work for this type of parsing.

A quick test with ParseWords:

use Text::ParseWords;
$var = '1234,"Smith,Steve",123-4567,"124 ""5th"" Street, Apt 24",Anytown,PR';
@lists = &nested_quotewords(',', 0, $var);
for (@lists) {
print "$_\n" for @{$_};


124 5th Street, Apt 24

See Text::ParseWords [perldoc.perl.org] for details

[edited by: phranque at 12:21 am (utc) on May 3, 2009]
[edit reason] added link [/edit]


7:19 pm on May 4, 2009 (gmt 0)

10+ Year Member

Text::xSV is the best .csv module I've found. You can try to write your own program that "almost works", but wait until they put embedded newlines into the data, which is legal in a .csv file.


9:50 pm on May 4, 2009 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Yes, embedded newlines are "legal", as are many other characters. The hardest part of this task is that there is no standard just guidelines, so to speak. I remember doing a lot of research before I wrote my own code to handle variations and tested thoroughly.

Side note: I just noticed I documented that method incorrectly, the "fields" param is of type "array", not string.


7:16 am on May 5, 2009 (gmt 0)

5+ Year Member

Text::CSV_XS can handle the embedded characters issue also. I have never tried Text::xSV (never even heard of it until just now) probably because it is a very new module. Looks interesting though.


8:43 pm on May 13, 2009 (gmt 0)

5+ Year Member

Text::CSV from CPAN is best module for this purpose and will use faster Text::CSV_XS automatically if it is available.

Featured Threads

Hot Threads This Week

Hot Threads This Month