Forum Moderators: phranque
I'd like to analyse weekly all the external links to every page of my site, based on the link:www.widgets.com SERP's from Google. I thought of doing that in Excel, but it seems tedious to do it manually like I do now:
I save the link SERP's page as a text file, delete manually all internal links, all comments and page titles, just to keep the URL. I import that cleaned up text page into Excel. I than add http:// to all cells, so it gives me a real link I can click on to analyse the linking URL.
This takes a very long time just to keep the URL's and their relative position. Has anybody a more efficient method please? I am not very literate in macro's and programming, but I am ready to learn!
TIA!
I have also put it on [seindal.dk...]
------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
my $ua = LWP::UserAgent->new;
while (my $url = shift) {
# Set up a callback that collect image links
my @links = ();
# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
my $p = HTML::LinkExtor->new(sub {
my ($tag, %attr) = @_;
return unless ($tag eq 'a' and defined $attr{href});
push(@links, $attr{href});
});
# Request document and parse it as it arrives
my $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])});
# Expand all image URLs to absolute ones and print them out
my $base = $res->base;
print join("\n", map { $_ = url($_, $base)->abs; } @links), "\n";
}
I have changed the program a bit so it works as a cgi-script.
You can use it at
[seindal.dk...]
You can grab the cgi-program at
[seindal.dk...]
Have fun!
René.
[google.com...]
I get an error message:
Couldn't retrieve [google.com...] or it contained no links
Thanks for your help!