Forum Moderators: coopster & phranque

Message Too Old, No Replies

Splitting a Flat file into a hash.

         

Brett_Tabke

6:31 am on Jun 22, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What's a slick way to do this:

Flat file format is simple "field a¦field b".
I want to split that into a hash so that field b is the key, and field a is the value.
$z{'$field b'}=field a.

Seems like there should be a simple way of doing it in a one liner. Something such as this :
$z{$field b}=$field a = split(/\¦/,$line);

(that won't work, but it is the idea).

Currently using the two stepper:
($v,$k) =split(/\¦/,$line);
$z{$k}=$v;

sugarkane

11:09 am on Jun 22, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This nearly works, but the order of key and value is switched - ie a hash is created where field a is the key and field b is the value.

%z=map {split /\¦/} @lines;

I'm sure there must be a way to switch the field order before mapping to %z

Brett_Tabke

8:00 pm on Jun 22, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



map is voodoo.

Bolotomus

8:38 pm on Aug 3, 2001 (gmt 0)

10+ Year Member



> I'm sure there must be a way to switch the field order before mapping to %z

Of course,

%z=map {reverse split /\¦/} @lines;

Here's a very similar way, you pack the whole file into the hash in a single line. Like the solution above, there is ZERO tolerance for out-of-format files!!!

#! /usr/bin/perl -w
use strict;

my $myfile = "test.txt";
my %hash=();
open(FILE,$myfile) ¦¦ die;
{
local $/ = undef;
%hash = split /\¦¦\n/,<FILE>; # munch munch
close FILE;
}

Brett_Tabke

11:45 am on Aug 4, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Thanks. The under used and often forgotten "reverse".

%z=map {reverse split /\¦/} @lines;

I never would have guessed that it would have worked that way on split. I can use that so many places.

Bolotomus

10:39 pm on Aug 4, 2001 (gmt 0)

10+ Year Member



Keep in mind WHY it works... map { } creates a new array, each element of which is defined in the { } block. if the { } block spits out more than one element, then the lists get shoved together. So in the end, if your file looks like

a¦1
b¦2
c¦3

then it produces a list ("a", 1, "b", 2, "c", 3)... and not anything fancy like ( ("a",1") , ("b",2), ("c",3) )

And of course, a hash is just an array when you look at it in pairs, so %z = ("a", 1, "b", 2, "c", 3) sets up the hash you want perfectly.

But you can also turn that data in to the list just by splitting on /\¦¦\n/ (in other words, a pipe *or* the newline character). That was the idea behind my solution.

BUT... what if one line contains "b¦2¦3" or something out of format? Now you've blown the whole hash, as you'll have an odd number of elements.

So while it's fun to come up with solutions like this, in the real world I would go with something much more robust and forget about trying to cram it all into a one-liner. I'd process each line one at a time, strip off #remarks with a regex, ignore empty lines, check the data validity, yada yada yada.

Lots of programmers think that map {} is somehow faster than using a loop, but they forget, map {} *is* a loop construct. In a way, split is too.

Brett_Tabke

5:07 pm on Aug 5, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I've become a major fan of Map. I kinda came at it backwards from perl faq6


use 5.005;
@popstates = qw(CO ON MI WI MN);
@poppats = map { qr/\b$_\b/i } @popstates;
while (defined($line = <>)) {
for $patobj (@poppats) {
print $line if $line =~ /$patobj/;
}
}

Compared to normal regex'ing matches, that is so fast. I've started using similar to that for regex matching in dozens of places.

(bt note: ugh, do not parse off spaces in this forum inside of [code]). And mull over syntax highlighting