I have a perl script with processes an XML feed. It creates a cache file of the XML and presents the output on my website formatted in the way outlined below. However, although I'm no expert, the script below works on the bases of there being line breaks (or spaces) within the cache file. Recently the link breaks in the XML feed have been removed so the script no longer processes the feed anymore. I am only a novice when it comes to xml and perl and I found this script on the web some time ago.
I would be really grateful if someone could advise what I need to change below in order for the script to process the file, now that there are no line breaks or spaces in the cache file.
use strict;
use CGI qw(:all);
#use Fcntl qw(:flock);
use LWP::Simple qw(get);
my $xml_url = "http://www.website.com/cgi-bin/xmlfeed.exe?&s=books&chan=xf";
my $newscache = "cache.xml";
########################################
# enter number of articles to include
my $howmanyarticles = "80";
########################################
# write the cache file or
# if the cache file is older than 1 hour
# then re-write it.
#get_lock();
if ((not -e $newscache) or (-M $newscache > .5)) { # (not -e) if the cache file doesn't exist, -M gives the modification time since creation, 1 is 1 day, 0.04 is about 1 hour.
my $newsdoc = get($xml_url);# uses the LWP module "get" function to get the XML file.
if (defined $newsdoc) {
open (CACHEFILE, ">$newscache") ¦¦ die "Writing to Cache : $!";
print CACHEFILE $newsdoc;
close (CACHEFILE);
}
}
#release_lock();
########################################
# now print the contents of the XML file
print header;
print "<table width=\"100%\" align= center border=\"0\" cellpadding=\"4\" cellspacing=\"0\">";
open (CF, "$newscache") ¦¦ die "Unable to open $newscache : $!";
my ($productname, $productvenue);
my $counter =0;
while (<CF>) {
if (m,<product_desc>(.*)</product_desc>,) {
$productname = $1;
$productname =~ s/'/'/g;
}
if (m,<venue_desc>(.*)</venue_desc>,) {
$productvenue = $1;
$productvenue =~ s/'/'/g;
}
if (m,<crypto_block>(.*)</crypto_block>,) {
print "
<tr>
<td valign=\"top\"><a href=\"\/cgi-bin/go.cgi?$1\"><b>$productname</b></a></td><td>$productvenue</td>
";
$counter++;
last if $counter == $howmanyarticles;
}
}
close(CF);
print "
<tr>
<td width=\"100\%\" colspan=\"4\" height=\"10\"></td>
</tr>
</table>
";
# END
Many thanks for any help / assistance.
/<somexmltag>(.*)</somexmltag>/
is faulty. It will grab everything between the first opening tag and the last closing tag because it's just place "greedy."
Replace it with:
/<somexmltag>(.*?)</somexmltag>/
Previously, the line breaks most likely kept the "greedy" pattern from matching too much.
You'll need to find the xml tag that delimits each record, then split the line based on that. So instead of:
while (<CF>)
use something like:
$a = <CF>;
@xml = split (/<\record_end_tag>/, $a);
for $xml_line(@xml) {
s/<record_start_tag>//;
and then continue on with your normal processing. That's really just a patch though. To deal with XML, use an XML parsing module. Check search.cpan.org and search for an XML module that does what you want (XML::Parser might be a good place to start).
Global symbol "@xml" requires explicit package name
The modified script is below. Any further advice most welcome.
use strict;
use CGI qw(:all);
#use Fcntl qw(:flock);
use LWP::Simple qw(get);
my $xml_url = "http://www.website.com/cgi-bin/xmlfeed.exe?&s=books&chan=xf";
my $newscache = "cache.xml";
########################################
# enter number of articles to include
my $howmanyarticles = "80";
########################################
# write the cache file or
# if the cache file is older than 1 hour
# then re-write it.
#get_lock();
if ((not -e $newscache) or (-M $newscache > .5)) { # (not -e) if the cache file doesn't exist, -M gives the modification time since creation, 1 is 1 day, 0.04 is about 1 hour.
my $newsdoc = get($xml_url);# uses the LWP module "get" function to get the XML file.
if (defined $newsdoc) {
open (CACHEFILE, ">$newscache") ¦¦ die "Writing to Cache : $!";
print CACHEFILE $newsdoc;
close (CACHEFILE);
}
}
#release_lock();
########################################
# now print the contents of the XML file
print header;
print "<table width=\"100%\" align= center border=\"0\" cellpadding=\"4\" cellspacing=\"0\">";
open (CF, "$newscache") ¦¦ die "Unable to open $newscache : $!";
my ($productname, $productvenue);
my $counter =0;
$a = <CF>;
@xml = split (/<\event>/, $a);
for $xml_line(@xml) {
s/<event>//;
{
if (m,<product_desc>(.*)</product_desc>,) {
$productname = $1;
$productname =~ s/'/'/g;
}
if (m,<venue_desc>(.*)</venue_desc>,) {
$productvenue = $1;
$productvenue =~ s/'/'/g;
}
if (m,<crypto_block>(.*)</crypto_block>,) {
print "
<tr>
<td valign=\"top\"><a href=\"\/cgi-bin/go.cgi?$1\"><b>$productname</b></a></td><td>$productvenue</td>
";
$counter++;
last if $counter == $howmanyarticles;
}
}
close(CF);
print "
<tr>
<td width=\"100\%\" colspan=\"4\" height=\"10\"></td>
</tr>
</table>
";
# END
XML::Treebuilder(or XML::Simple) for smaller xml files and SAX processing for bigger jobs....at least that seems to have worked for me.