Forum Moderators: coopster & phranque

Message Too Old, No Replies

String Parsing Pain

attempt to parse variable string to multiple variables

         

engel5

4:37 pm on Sep 14, 2004 (gmt 0)

10+ Year Member



I am trying to parse a string for a modification to a Feed Aggregator I am trying to make. The end goal is for the script to read my favorites.xml file from Blo.gs and use the RSS locations for downloading the feeds.

But I can't get it to load all the variables on the line.

The input lines look like this:


<weblog name="A young Mennonite" url="http://aym.example.net/" rss="http://aym.example.net/index.xml" when="11491637" xfn="me" />

The xfn="..." portion is not always present.

I attempted it with this (but obviously failed ;-)


my($f_nick,$f_url) = ($1¦¦'', $3¦¦'') if ($pml =~ /<weblog name="(.*?)" url="(.*?)" rss="(.*?)" when="(.*?)"(?: xfn="(.*?)") \/>/s);

Any assistance would be greatly appreciated.

David Engel

[edited by: coopster at 4:40 pm (utc) on Sep. 14, 2004]
[edit reason] generalized urls [/edit]

IanKelley

4:17 am on Sep 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Assuming you can count on the field order, try this...


while ($pml =~ /"(.*?)"/g) {
push(@array,$1);
}

jollymcfats

4:19 am on Sep 17, 2004 (gmt 0)

10+ Year Member



The xfn part of the regex just needs a trailing? to make it optional:

(?: xfn="(.*?)")?

You could also snarf all of the tag attributes into a hash. That way you can count on getting the values you need regardless of what attribute order happens to be in the xml.

my %attribs = ($pml =~ m~([a-z]+)="([^"]+)"~g); 
my($f_nick,$f_url) = @attribs{qw(name rss)};
use Data::Dumper; # just for fun
print Dumper(\%attribs);

$VAR1 = { 
'iwhen' => '11491637',
'url' => 'http://aym.example.net/',
'name' => 'A young Mennonite',
'xfn' => 'me',
'rss' => 'http://aym.example.net/index.xml'
};

If it's important for not-present attributes to be the empty string (You had $foo = ($1¦¦'')), you could

my($f_nick, $f_url) = map { defined $_? $_ : '' } @attribs{qw(name rss)};
instead.

moltar

3:48 am on Sep 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I suggest you look into XML::Simple. No need to reinvent the wheel.

IanKelley

4:55 am on Sep 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are quite a few reasons to reinvent the wheel. The two biggest...

If you're writing a script to parse multiple feeds then parsers like XML::Anything in perl or the built in parser in PHP will break if a feed sends you something non-standard. This is not uncommon and it's not only ignorance that causes it. You can make XML take up a lot less processing time and bandwidth by stripping out some of the standard requirements.

If you're writing a script to parse a single feed it will be far more efficient if you write custom parse code than if you use a parser module, assuming you write it well of course :-)

moltar

5:05 am on Sep 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can never predict the bad code anyways. You can go that far to minimizing the risk, but the code can still be broken.

If you process a single feed only, then efficiency is not an issue, since you won't be processing the same feed many times per second. XML::Simple is pretty fast anyways.

IanKelley

6:56 am on Sep 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've worked with a lot of sites that process a single feed millions of times a day.

I'm sure that's not the case here, XML::Simple probably is the best option. Just making a point about relying on modules.

engel5

6:24 pm on Sep 27, 2004 (gmt 0)

10+ Year Member



Thanks, everyone, for the assistance. I will be avoiding the use of XML::Simple because 1) I am working on a modification to Rael Dornfest's Blagg feed aggregator, and I am keeping it in the spirit of simplicity and self-contained code - i.e., no modules, and 2) I do not expect any differences in the file that it is parsing, so the depth of being able to handle multiple data situations is not really necessary.

David Engel