I've wrangled with this one on and off. Usually I take the "Google Base" approach and convince the client to quote qualify everything, or nothing, just stay consistent.
But some customers. You know. So what would you do to split up lines, something like this?
id,name,phone,address,city,state.....
1234,"Smith,Steve",123-4567,"124 ""5th"" Street, Apt 24",Anytown,PR.....
I could figure something out, or use a module, just wondering what others do with these.
The quote qualifiers may or may not be present, they won't always be present in any particular field (as in "name" above,) some export programs double-quote field value quotes as shown above . . .
What do you do with these?
// {{{ _outputCsv
/**
* PHP < 5.1.0 workaround for fputcsv
* PHP Source: /php/ext/standard/file.c (fputcsv)
* Find any enclosures?
* Escape with an additional enclosure
* Next, find any spaces, delimiters or enclosures?
* Wrap the whole works in a set of enclosures
*
* @param string $fields array to process
* @param string $delimiter field delimiter character
* @param string $enclosure field enclosure character
* @param string $newline newline character
* @return void printed output should be buffered
* @access private
*/
private function _outputCsv($fields, $delimiter, $enclosure, $newline)
{
$out = '';
$pattern = '/' . preg_quote($enclosure, '/') . '/';
$enclose = '/[\s' . preg_quote("$delimiter$enclosure", '/') . ']/';
$delimit = '/' . preg_quote($delimiter, '/') . '$/';
foreach ($fields as $field) {
$field = preg_replace($pattern, "$enclosure$enclosure", $field);
if (preg_match($enclose, $field)) {
$field = "$enclosure$field$enclosure";
}
$out .= "$field$delimiter";
}
print preg_replace($delimit, '', $out) . $newline;
}
/**
* End PHP < 5.1.0 workaround for fputcsv
*/
Text::Balanced
Text::ParseWords
both or one could work for this type of parsing.
A quick test with ParseWords:
use Text::ParseWords;
$var = '1234,"Smith,Steve",123-4567,"124 ""5th"" Street, Apt 24",Anytown,PR';
@lists = &nested_quotewords(',', 0, $var);
for (@lists) {
print "$_\n" for @{$_};
}
output:
1234
Smith,Steve
123-4567
124 5th Street, Apt 24
Anytown
PR
See Text::ParseWords [perldoc.perl.org] for details
[edited by: phranque at 12:21 am (utc) on May 3, 2009]
[edit reason] added link [/edit]
Side note: I just noticed I documented that method incorrectly, the "fields" param is of type "array", not string.