regex in perl

this may or may not supposed to be in this forum, i didnt think any of the other forums were a better choice.

i have a perl script that uses regular expressions to parse a webpage and return results from the page. there are 20 items that i am parsing for. all was going fine except one was not playing nice and showing the value for the very first item that i was searching for. eventually, i realized that the items on the page had changed order and that was the problem. after fixing the order problem, all 20 values showed up correctly.

here is the problem. this page will change the order of the items in my list on a regular basis. i need a way to keep using just one script and returning all of the values without any extra cruft messages. here is a sample of the script i am using for weather graphing. (graphing is a side item, the script just returns values, just like the script i am having trouble with)

#!/usr/bin/perl
use warnings;
use strict;

use LWP::Simple;

my $httpaddr = "http://www.aws.com/aws_2001/asp/obsForecast.asp?id=WISHT";

my %data;
my %trash;
my $content = LWP::Simple::get($httpaddr) or die "Couldn't get it!";

# regex in html source order
if ($content =~ /(Temperature<\/b>)/g) { $trash{a} = $1; }
if ($content =~ /(-?\d+\.\d+)<\/b>/g) { $data{Temp} = $1; }

if ($content =~ /(Humidity<\/b>)/g) { $trash{a} = $1; }
if ($content =~ /(\d+\.\d+)<\/b>/g) { $data{Humidity} = $1; }

if ($content =~ /(Wind<\/b>)/g) { $trash{a} = $1; }
if ($content =~ /(\d+\.\d+)<\/b>/g) { $data{Wind} = $1; }

if ($content =~ /(Daily Rain<\/b>)/g) { $trash{a} = $1; }
if ($content =~ /(\d+\.\d+)<\/b>/g) { $data{Rain} = $1; }

if ($content =~ /(Pressure<\/b>)/g) { $trash{a} = $1; }
if ($content =~ /(\d+\.\d+)<\/b>/g) { $data{Pressure} = $1; }

if ($content =~ /(HEAT INDEX¦WIND CHILL)/g) { $trash{a} = $1; }
if ($content =~ /(\d+\.\d+)/g) { $data{HeatIndex} = $1; }

if ($content =~ /(DEW POINT:)/g) { $trash{a} = $1; }
if ($content =~ /(\d+\.\d+)/g) { $data{DewPoint} = $1; }

for (keys %data) {
printf "%s:%s ", $_, $data{$_};
}
print "\n";

all of these values stay in the same order all the time, they just change the values returned as the weather changes.

i have tried adding a


my %data;
my %trash;
my $content = LWP::Simple::get($httpaddr) or die "Couldn't get it!";

section for each item so it would start over each time, problem is it returns a bunch of messages about the variables changing and that extra cruft prevents my graphing portion from reading the output properly.

any help is greatly appreciated. thanks!

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $httpaddr = "http://www.example.com";
my %data;
my %trash;
my $content = get($httpaddr) or die "Couldn't get it!";
$content =~ s/<(?:[^>'"]*¦(['"]).*?\1)*>//gs;
$content =~ s/\s+/ /gs;
$content =~ s/&[a-zA-Z]{3,4};//gs;
if ($content =~ /(Temperature).+?(-?\d+\.\d+)/) {
$trash{a} = $1;
$data{Temp} = $2;
}
if ($content =~ /(Humidity).+?(\d+\.\d+)/) {
$trash{a} = $1;
$data{Humidity} = $2;
}
if ($content =~ /(Wind).+?(\d+\.\d+)/) {
$trash{a} = $1;
$data{Wind} = $2;
}
if ($content =~ /(Daily Rain).+?(\d+\.\d+)/) {
$trash{a} = $1;
$data{Rain} = $2;
}
if ($content =~ /(Pressure).+?(\d+\.\d+)/) {
$trash{a} = $1;
$data{Pressure} = $2;
}
if ($content =~ /(HEAT INDEX¦WIND CHILL).+?(\d+\.\d+)/) {
$trash{a} = $1;
$data{HeatIndex} = $2;
}
if ($content =~ /(DEW POINT:).+?(\d+\.\d+)/) {
$trash{a} = $1;
$data{DewPoint} = $2;
}
for (keys %data) {
printf "%s:%s ", $_, $data{$_};
}
print "\n";

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $httpaddr = "http://www.your-url-here.com";
my %data;
my %trash;
my $content = get($httpaddr) or die "Couldn't get it!";
$content =~ s/<(?:[^>'"]*¦(['"]).*?\1)*>//gs; # removes html tags
$content =~ s/\s+/ /gs; # collapses multiple spaces to one space
$content =~ s/&#?[a-zA-Z0-9]{3,6};//gs; # removes ASCII entities
print $content;

regex in perl

Diceman

perl_diver

rocknbil

perl_diver

Diceman

perl_diver

Diceman

perl_diver

Diceman

perl_diver

pinterface

Diceman

perl_diver

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week