I took a look at WWW::Mechanize
I have never installed a perl module before or run a makefile. I tried to locate this info for WWW::Mechanize but could not find instructions. Could someone direct me?
What I need to do is fetch a url and get several relative links (a href). I needs to have some logic not to get the absolute links. Then using use URI I rebuild the relative links into absolute links. Then I need to go to each of these url's and extract the title from the page and save to a file the title with the url name. Will WWW::Mechanize be best to do this? I am just a newbie at perl is this going to require some advanced scripting?
I started some basic code below.
#!/usr/bin/perl
use LWP::Simple;
my $content = get( "url" ) or die $!;
foreach line (
$line =~ m{href="(.*?)"}ig
$url = $1;
$wholeLink = "http\:\/\/" . "$url\n";
)
# cpan
cpan> install WWW::Mechanize
It'll follow dependencies if you let it, so it'll install whatever is needed.
I would really look at Mechanize for what you're doing... With LWP you have to build the canonical URL yourself (URI module) not to mention parse the page (HTML::TokeParser is good for this). Mech provides built in functions to extract links from a page, and provides access to the underlying LWP object.
I've been using LWP and associated modules for years, I discovered Mech a matter of months ago and it is *so* much easier. Check out my home page in my profile, I wrote a brief article comparing LWP and Mech for web scraping.
If you want to get further into it, O'Reilly makes a couple of great books -- LWP & Perl, and Spidering Hacks. The former is all about LWP and HTML parsing, the latter uses a variety of techniques to grab information off the web. Again, a review of LWP & Perl is on my website.
Sean
Checking if your kit is complete...
Looks good
Writing Makefile for WWW::Mechanize
make: *** No rule to make target `/usr/lib/perl5/5.8.3/i386-linux-thread-multi/CORE/config.h', needed by `Makefile'. Stop.
/usr/bin/make -- NOT OK
Running make test
Can't test without successful make
Running make install
make had returned bad status, install seems impossible