homepage Welcome to WebmasterWorld Guest from 54.167.179.48
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Perl script processing XML problem
dawlish

10+ Year Member



 
Msg#: 3565 posted 3:33 pm on Apr 2, 2004 (gmt 0)

Hi,

I have a perl script with processes an XML feed. It creates a cache file of the XML and presents the output on my website formatted in the way outlined below. However, although I'm no expert, the script below works on the bases of there being line breaks (or spaces) within the cache file. Recently the link breaks in the XML feed have been removed so the script no longer processes the feed anymore. I am only a novice when it comes to xml and perl and I found this script on the web some time ago.

I would be really grateful if someone could advise what I need to change below in order for the script to process the file, now that there are no line breaks or spaces in the cache file.

use strict;
use CGI qw(:all);
#use Fcntl qw(:flock);
use LWP::Simple qw(get);

my $xml_url = "http://www.website.com/cgi-bin/xmlfeed.exe?&s=books&chan=xf";

my $newscache = "cache.xml";
########################################
# enter number of articles to include
my $howmanyarticles = "80";

########################################
# write the cache file or
# if the cache file is older than 1 hour
# then re-write it.

#get_lock();
if ((not -e $newscache) or (-M $newscache > .5)) { # (not -e) if the cache file doesn't exist, -M gives the modification time since creation, 1 is 1 day, 0.04 is about 1 hour.
my $newsdoc = get($xml_url);# uses the LWP module "get" function to get the XML file.
if (defined $newsdoc) {
open (CACHEFILE, ">$newscache") die "Writing to Cache : $!";
print CACHEFILE $newsdoc;
close (CACHEFILE);
}
}
#release_lock();

########################################
# now print the contents of the XML file

print header;
print "<table width=\"100%\" align= center border=\"0\" cellpadding=\"4\" cellspacing=\"0\">";

open (CF, "$newscache") die "Unable to open $newscache : $!";
my ($productname, $productvenue);
my $counter =0;
while (<CF>) {
if (m,<product_desc>(.*)</product_desc>,) {
$productname = $1;
$productname =~ s/&apos;/'/g;
}
if (m,<venue_desc>(.*)</venue_desc>,) {
$productvenue = $1;
$productvenue =~ s/&apos;/'/g;

}

if (m,<crypto_block>(.*)</crypto_block>,) {
print "
<tr>

<td valign=\"top\"><a href=\"\/cgi-bin/go.cgi?$1\"><b>$productname</b></a></td><td>$productvenue</td>
";
$counter++;
last if $counter == $howmanyarticles;
}
}
close(CF);

print "
<tr>
<td width=\"100\%\" colspan=\"4\" height=\"10\"></td>
</tr>
</table>
";
# END

Many thanks for any help / assistance.

 

timster

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3565 posted 5:03 pm on Apr 2, 2004 (gmt 0)

The construction I see several times in the file:

/<somexmltag>(.*)</somexmltag>/

is faulty. It will grab everything between the first opening tag and the last closing tag because it's just place "greedy."

Replace it with:
/<somexmltag>(.*?)</somexmltag>/

Previously, the line breaks most likely kept the "greedy" pattern from matching too much.

dawlish

10+ Year Member



 
Msg#: 3565 posted 5:24 pm on Apr 2, 2004 (gmt 0)

timster,
Thanks for the suggestion. I made the change but the script now only outputs and formats one entry. It only processes the first set of tags - it doesn't go on to process the other lines.

VectorJ

10+ Year Member



 
Msg#: 3565 posted 4:14 am on Apr 3, 2004 (gmt 0)

The <> operator works by bringing in a line delimited by a \n (that's the Perl code for line break), so since they've taken out the line breaks you won't be able to process the file that way.

You'll need to find the xml tag that delimits each record, then split the line based on that. So instead of:

while (<CF>)

use something like:

$a = <CF>;
@xml = split (/<\record_end_tag>/, $a);
for $xml_line(@xml) {
s/<record_start_tag>//;

and then continue on with your normal processing. That's really just a patch though. To deal with XML, use an XML parsing module. Check search.cpan.org and search for an XML module that does what you want (XML::Parser might be a good place to start).

dawlish

10+ Year Member



 
Msg#: 3565 posted 3:29 pm on Apr 3, 2004 (gmt 0)

VectorJ, thanks for the advice. However I'm a real novice when it comes to perl and I have tried introducing the changes you suggest, but all I get now is an internal server error with the following in the error logs:

Global symbol "@xml" requires explicit package name

The modified script is below. Any further advice most welcome.

use strict;
use CGI qw(:all);
#use Fcntl qw(:flock);
use LWP::Simple qw(get);

my $xml_url = "http://www.website.com/cgi-bin/xmlfeed.exe?&s=books&chan=xf";

my $newscache = "cache.xml";
########################################
# enter number of articles to include
my $howmanyarticles = "80";

########################################
# write the cache file or
# if the cache file is older than 1 hour
# then re-write it.

#get_lock();
if ((not -e $newscache) or (-M $newscache > .5)) { # (not -e) if the cache file doesn't exist, -M gives the modification time since creation, 1 is 1 day, 0.04 is about 1 hour.
my $newsdoc = get($xml_url);# uses the LWP module "get" function to get the XML file.
if (defined $newsdoc) {
open (CACHEFILE, ">$newscache") die "Writing to Cache : $!";
print CACHEFILE $newsdoc;
close (CACHEFILE);
}
}
#release_lock();

########################################
# now print the contents of the XML file

print header;
print "<table width=\"100%\" align= center border=\"0\" cellpadding=\"4\" cellspacing=\"0\">";

open (CF, "$newscache") die "Unable to open $newscache : $!";
my ($productname, $productvenue);
my $counter =0;
$a = <CF>;
@xml = split (/<\event>/, $a);
for $xml_line(@xml) {
s/<event>//;
{
if (m,<product_desc>(.*)</product_desc>,) {
$productname = $1;
$productname =~ s/&apos;/'/g;
}
if (m,<venue_desc>(.*)</venue_desc>,) {
$productvenue = $1;
$productvenue =~ s/&apos;/'/g;

}

if (m,<crypto_block>(.*)</crypto_block>,) {
print "
<tr>

<td valign=\"top\"><a href=\"\/cgi-bin/go.cgi?$1\"><b>$productname</b></a></td><td>$productvenue</td>
";
$counter++;
last if $counter == $howmanyarticles;
}
}
close(CF);

print "
<tr>
<td width=\"100\%\" colspan=\"4\" height=\"10\"></td>
</tr>
</table>
";
# END

timster

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3565 posted 12:40 pm on Apr 5, 2004 (gmt 0)

When you "use strict" you need to declare your variables, like so:

my @xml = split (/<\event>/, $a);

This is no big deal, but you might also want to replace your "END" comment with an actual end line, like so:

__END__

markanthony

10+ Year Member



 
Msg#: 3565 posted 7:08 pm on May 17, 2004 (gmt 0)

You might also find some help in researching XML::Treebuilder to parse the xml for you. A lot of the complications of parsing xml and tag-based data have been thought out for you. Don't duplicate the work that they have done and given to the community.

XML::Treebuilder(or XML::Simple) for smaller xml files and SAX processing for bigger jobs....at least that seems to have worked for me.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved