Extracting links from SERPS to spreadsheets

Forum Moderators: phranque

Message Too Old, No Replies

Extracting links from SERPS to spreadsheets

any tool available?

pvdm

11:12 am on Nov 9, 2002 (gmt 0)

Hi,

I'd like to analyse weekly all the external links to every page of my site, based on the link:www.widgets.com SERP's from Google. I thought of doing that in Excel, but it seems tedious to do it manually like I do now:

I save the link SERP's page as a text file, delete manually all internal links, all comments and page titles, just to keep the URL. I import that cleaned up text page into Excel. I than add http:// to all cells, so it gives me a real link I can click on to analyse the linking URL.

This takes a very long time just to keep the URL's and their relative position. Has anybody a more efficient method please? I am not very literate in macro's and programming, but I am ready to learn!

TIA!

heini

11:52 am on Nov 9, 2002 (gmt 0)

Hmm, there should be an easy way to strip everything from a html page or a text file but urls, I should think.
SEO geeks, where art thou?

seindal

12:27 pm on Nov 9, 2002 (gmt 0)

Here's a little program in perl that can download a page from the web, extract all the links and print them.

I have also put it on [seindal.dk...]

------------------
#!/usr/bin/perl

use strict;
use warnings;

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

my $ua = LWP::UserAgent->new;

while (my $url = shift) {
# Set up a callback that collect image links
my @links = ();

# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
my $p = HTML::LinkExtor->new(sub {
my ($tag, %attr) = @_;
return unless ($tag eq 'a' and defined $attr{href});
push(@links, $attr{href});
});

# Request document and parse it as it arrives
my $res = $ua->request(HTTP::Request->new(GET => $url), sub {$p->parse($_[0])});

# Expand all image URLs to absolute ones and print them out
my $base = $res->base;
print join("\n", map { $_ = url($_, $base)->abs; } @links), "\n";
}

pvdm

12:35 pm on Nov 9, 2002 (gmt 0)

Thanks Seindal, I saved your .pl Perl applet.

Now, how do I install, run or use this please?

seindal

3:56 pm on Nov 9, 2002 (gmt 0)

Hi pvdm,

I have changed the program a bit so it works as a cgi-script.

You can use it at
[seindal.dk...]

You can grab the cgi-program at
[seindal.dk...]

Have fun!

Ren�.

pvdm

5:06 pm on Nov 9, 2002 (gmt 0)

Thanks Seindal. Trying your CGI script on your site, when I tried to get the links given by Google for Webmasterworld on the page

[google.com...]

I get an error message:

Couldn't retrieve [google.com...] or it contained no links

Thanks for your help!

seindal

7:06 pm on Nov 9, 2002 (gmt 0)

Sorry, I hadn't checked google.
Apparently they have some kind of User-Agent block, so they blocked the perl modules, I use. I now fake a MSIE user agent, and it works for me.

Ren�.

pvdm

9:46 am on Nov 11, 2002 (gmt 0)

Hello Ren�,

sorry for answering late, I was on a week-end trip ;-)

Thank you very much for your tool, it's working now and it's perfect for my needs! You're great :-)