Splitting a Flat file into a hash.

Forum Moderators: coopster & phranque

Message Too Old, No Replies

Splitting a Flat file into a hash.

Brett_Tabke

6:31 am on Jun 22, 2001 (gmt 0)

What's a slick way to do this:

Flat file format is simple "field aŚfield b".
I want to split that into a hash so that field b is the key, and field a is the value.
$z{'$field b'}=field a.

Seems like there should be a simple way of doing it in a one liner. Something such as this :
$z{$field b}=$field a = split(/\Ś/,$line);

(that won't work, but it is the idea).

Currently using the two stepper:
($v,$k) =split(/\Ś/,$line);
$z{$k}=$v;

sugarkane

11:09 am on Jun 22, 2001 (gmt 0)

This nearly works, but the order of key and value is switched - ie a hash is created where field a is the key and field b is the value.

%z=map {split /\Ś/} @lines;

I'm sure there must be a way to switch the field order before mapping to %z

Brett_Tabke

8:00 pm on Jun 22, 2001 (gmt 0)

map is voodoo.

Bolotomus

8:38 pm on Aug 3, 2001 (gmt 0)

> I'm sure there must be a way to switch the field order before mapping to %z

Of course,

%z=map {reverse split /\Ś/} @lines;

Here's a very similar way, you pack the whole file into the hash in a single line. Like the solution above, there is ZERO tolerance for out-of-format files!!!

#! /usr/bin/perl -w
use strict;

my $myfile = "test.txt";
my %hash=();
open(FILE,$myfile) ŚŚ die;
{
local $/ = undef;
%hash = split /\ŚŚ\n/,<FILE>; # munch munch
close FILE;
}

Brett_Tabke

11:45 am on Aug 4, 2001 (gmt 0)

Thanks. The under used and often forgotten "reverse".

%z=map {reverse split /\Ś/} @lines;

I never would have guessed that it would have worked that way on split. I can use that so many places.

Bolotomus

10:39 pm on Aug 4, 2001 (gmt 0)

Keep in mind WHY it works... map { } creates a new array, each element of which is defined in the { } block. if the { } block spits out more than one element, then the lists get shoved together. So in the end, if your file looks like

aŚ1
bŚ2
cŚ3

then it produces a list ("a", 1, "b", 2, "c", 3)... and not anything fancy like ( ("a",1") , ("b",2), ("c",3) )

And of course, a hash is just an array when you look at it in pairs, so %z = ("a", 1, "b", 2, "c", 3) sets up the hash you want perfectly.

But you can also turn that data in to the list just by splitting on /\ŚŚ\n/ (in other words, a pipe *or* the newline character). That was the idea behind my solution.

BUT... what if one line contains "bŚ2Ś3" or something out of format? Now you've blown the whole hash, as you'll have an odd number of elements.

So while it's fun to come up with solutions like this, in the real world I would go with something much more robust and forget about trying to cram it all into a one-liner. I'd process each line one at a time, strip off #remarks with a regex, ignore empty lines, check the data validity, yada yada yada.

Lots of programmers think that map {} is somehow faster than using a loop, but they forget, map {} *is* a loop construct. In a way, split is too.

Brett_Tabke

5:07 pm on Aug 5, 2001 (gmt 0)

I've become a major fan of Map. I kinda came at it backwards from perl faq6


use 5.005;  
@popstates = qw(CO ON MI WI MN);
@poppats = map { qr/\b$_\b/i } @popstates;  
while (defined($line = <>)) {
for $patobj (@poppats) {
print $line if $line =~ /$patobj/;
}
}

Compared to normal regex'ing matches, that is so fast. I've started using similar to that for regex matching in dozens of places.

(bt note: ugh, do not parse off spaces in this forum inside of [code]). And mull over syntax highlighting