I'm very new to scripting and can't really work out what's happening and any advice would be welcome.
#!/usr/bin/perl
print "Content-type: text/html\n\n";
use LWP::Simple;
$doc = get "http://news.bbc.co.uk/hi/english/uk/default.stm";# Read the BBCNews Retriever
@bbc = split(/\n/, $doc);
$flag = 0;
$next = 0;
foreach $line (@bbc) {# Look for the headlines (usually five)
if ($line =~ /<DIV CLASS="bodytext">/) {
if ($flag eq 0) {
$flag = 1;
}
} elsif ($line =~ /<A href="/) {# Get the URL
if ($flag eq 1) {
$buffer = $line;
$flag = 2;
}
} elsif ($line =~ /<B class="h/) {# Get the description
if ($flag eq 2) {
$buffer=$buffer.$line;
$flag = 3;
$next = 1;
}
} elsif ($next eq 1) {# Get the summary (usually the next three lines)
if ($flag eq 3) {
$story = $line;
$next = 2;
}
} elsif ($next eq 2) {
if ($flag eq 3) {
$story = $story.$line;
$next = 3;
}
} elsif ($next eq 3) {
if ($flag eq 3) {
$story = $story.$line;
&format();# Format the data
$news = $news.$buffer.$story;
$flag = 0;
$next = 0;
}
} else {
# Do nothing!
}
}
print qq~
<dl>$news</dl>
~;
exit;
sub format {
# This cleans the lines so that the HTML is displayed correctly (i.e. HTML 4.01)
$story =~ s/<br clear=all>//i;
$story =~ s/<.a>//i;
$story =~ s/<.b>//i;
$story =~ s/<br[^>]*>//i;
$story =~ s/<.div>//i;
$story =~ s/\t//g;
$story =~ s/\r//g;
$title = "@@@".$story;
$title =~ s/\.//;
$title =~ s/@@@ //;
$title =~ s/@@@//;
$title =~ s/"/''/g;
$buffer =~ s/<a href[^>]//i;
$buffer =~ s/"//i;
$buffer =~ s/<b class[^>]*>//i;
$link = $buffer."@@@";
$link =~ s/>[^@@@]*@@@//i;
$buffer =~ s/>/ target="_blank">/i; # Can add target="_blank" here if you want a new page opened
$buffer =~ s/\t//g;
$buffer =~ s/\r//g;
$story = "$story<br><br>";
$buffer = "<a href=\"http://news.bbc.co.uk".$buffer."</a>\n";
}
The generic version of the script would only consist of
[perl]
#!/usr/bin/perl
print "Content-type: text/html\n\n";
use LWP::Simple;
$doc = get "http://somesite.com";
@lines = split(/\n/, $doc);
foreach $line (@lines) {
# parse the page
}
[/perl]
...which is pretty useless as it only fetches the page but then does nothing with it. The code within the foreach {} loop would have to be custom written for each individual news source you wanted to use...
#!C:\perl\bin\perl.exe rsspage.pl
###Creates a simple RSS table that could be
###used via a SSI call.
$limit = 6;
$rssout = 'rssout.inc';
$foo = "page2.xml";
use LWP::Simple;
open (LIST, "rsslist.txt") ¦¦ die "no rss list\n";
while (<LIST>) {
push (@list, $_);
}
close LIST;
srand;
$num = rand(11);
($n,$trash) = split(/\./, $num);
$fetch = $list[$n];
$rss = $fetch;
getstore("$rss", "$foo")¦¦ die "Get failed\n";;
use XML::Simple;
$fd = new XML::Simple( );
my $simp = $fd -> XMLin();
$content = $simp->{channel}->{title};
for ($i=0; $i <= $limit; $i++) {
$url = $simp->{channel}->{item}->[$i]->{link};
$text = $simp->{channel}->{item}->[$i]->{title};
$body .= "<tr><td><a href=\"$url\" class=\"blk\">";
$body .= "$text</a></tr>";
}
open (OUT, ">$rssout") ¦¦ warn "Couldn't open out file\n";
print OUT "<h3> $content </h3>";
print OUT "<table>";
print OUT $body;
print OUT "</table>";
close OUT;
unlink ($foo);
****End Script
**Example moreover links in a file ,one per line.******
[p.moreover.com...]
***End Sample******
Now that might look bad but most of its explained in the perldoc page for XML::Simple and there is an IBM Developer's Works Page for XML that uses the same basic principles. And of course always one can get some examples of RSS and XML at Oreilly's site.
Last note is that all the links are assinged the CSS class "blk" for block , so you can put a CSS ruleset to match .blk { display: block; } and whatever other style is required for the links themselves. Could assign a ruleset to the table but I just wrap the SSI call into a DIV with its class/styles defined.
HTH
later
<fixed smileys - sugarkane>
(edited by: sugarkane at 10:28 pm (gmt) on Dec. 2, 2001)