Forum Moderators: coopster & phranque

Message Too Old, No Replies

Multiple Column sort in a flatfile text (csv) database

         

Allen

11:10 am on Mar 22, 2002 (gmt 0)

10+ Year Member



Hi all,

Here's what I want to do:
I have a flatfile text database, basically in CSV (well, it's using pipes instead comma's - but that shouldn't matter)

It has a total of 7 columns (lets call them 0 to 6 - like Perl would)

They need to be sorted in the following order:
First by column 2
Then by column 5
Then by column 4

Anyone got any ideas on how to accomplish this?

Thanks In Advance,

Allen

sugarkane

3:37 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good question Allen - it had me scratching my head for a while ;)

The code below will sort a four-field pipe-delimited file, first by field1, then by field2

[perl]
@ranked= sort multisort @lines;

sub multisort {
# first split $a and $b into fields
($field1a, $field2a, $field3a, $field4a)=split(/\¦/, $a);
($field1b, $field2b, $field3b, $field4b)=split(/\¦/, $b);

# now do the actual sort:
$field1a cmp $field1b
or
$field2a cmp $field2b;
}
[/perl]

The key to it is that you can define your own subroutine for Perl's sort function. The sort function passes two temporary variables to your routine - $a and $b - which are set to be 2 elements from your array. You can then compare them to one another using the standard perl comparison operators, and 'sort' uses the result to sort your array.

So far, so good - you could do a simple text sort like this:
[perl]
@ranked=sort {$a cmp $b} @array;
[/perl]

The trick to sorting by multiple fields is this: if your first comparison doesn't return a positive or negative result (ie the two elements are the same) you can use the 'or' operator to specify a further comparison, effectively allowing you to sort by a secondary criteria.

So, for your specific case, you'd first split the array elements into fields as above, then use this comparison:

[perl]
$field2a cmp $field2b # first sort on field 2
or
$field5a cmp $field5b # then on field 5...
or
$field4a cmp $field4b; # ...finally on field 4
[/perl]

I hope that makes sense!

Allen

8:58 am on Mar 26, 2002 (gmt 0)

10+ Year Member



I sorted it myself in the end. What I did was to use my own routine withing the sort{} which made a single string out of the fields I wanted to sort by, in the order I wanted to sort them by, and then returned the cmp of the strings for the two records.

Like this:
sub ajb_comp {
# Compare parts of each record
$ajba = $_[0];
$ajbb = $_[1];

$one = (split(/\¦/, $ajba))[2] . (split(/\¦/, $ajba))[5] . (split(/\¦/, $ajba))[4];
$two = (split(/\¦/, $ajbb))[2] . (split(/\¦/, $ajbb))[5] . (split(/\¦/, $ajbb))[4];

$one =~ tr/\// /;
$two =~ tr/\// /;

$one =~ tr/\ /a/;
$two =~ tr/\ /a/;

$test = $one cmp $two;

return $test;
}

In case your wondering, I put the value into $test first so I could output it to the screen when debugging.

Allen