Welcome to WebmasterWorld Guest from 54.166.46.226

Forum Moderators: bakedjake

Message Too Old, No Replies

gURL - how to get at the information?

   
9:58 pm on Nov 4, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I've just got gURL scanning several web sites. Those where I'm only interested in local broken page links are no problem but a couple of sites are directories where I need to automatically remove external links if they have gone away.

I cannot find any way to export the links as (eg) a CSV or text file. Is this at all possible? Am I missing something?

Alternatively, is there an alternative link checker that would do the job? I would prefer a tailored link checker rather than a more generalised page "scraper".
1:43 pm on Nov 5, 2010 (gmt 0)

5+ Year Member



I assume you are talking about gURLChecker [gurlchecker.labs.libre-entreprise.org...]

I believe I was in touch with the programmer (Emmanuel Saracco) once because I also needed some other features.

I will try to contact him. Let's see what we can do. GURLChecker is a great program.
11:36 pm on Nov 5, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



You're correct about the name: I was working from memory. :(

I agree it's a useful program but it could be more so.

The fact that it creates caches of the visited pages has already proved useful: Clam found a number of viruses in the cached pages. I got rid of those very quickly. :)
8:23 am on Nov 6, 2010 (gmt 0)



Hi!

Please, use gurlchecker feature requests to ask for more, or project's lists to discuss about it ;-)

[labs.libre-entreprise.org ]

Thanks!

Bye
11:01 pm on Nov 6, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Thanks for joining in, esaracco.

Would there be any point in adding a new feature request along the lines I mentioned? The web site seems to be short on support for those features so far requested on it.

I notice the installation I have is 0.10.3 (Ubuntu Hardy) and there is no obvious way of updating it to 0.13. Clicking on the files download on the site merely offers a way to save the tar or extract it.
11:26 pm on Nov 6, 2010 (gmt 0)



Hi dstiles,

0.10.3... Outch! Then you really should use current 0.13 release, for sure ;-)

There are packages for maverick and natty:

[packages.ubuntu.com...]

But gurlchecker is easy to build if you have devel packages and appropriate libraries versions.

Regarding gurlchecker config file, the better is to remove "~/.gurlchecker" directory before running the new version (but you will lose all projects and settings).

Good luck.

Bye
10:18 pm on Nov 7, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Hi, esaracco.

Thanks for the pointer. A link on your site to that page would be useful? :)

Not that the link really helped. It still supplies 0.10.3 for hardy and I am not sufficiently confident to over-write it with a different version.

What effect would that have? Both maverick and natty are, as far as I'm aware, betas: the upgrade OS version currently being offered me is karmic so neither would seem to be relevant.

As regards compiling: I used to be a software developer. The operative phrase here is "used to be". :)
5:11 am on Nov 15, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



There are several alternatives:

[url:
[linkchecker.sourceforge.net...]

I am not sure if any do what you want.
11:04 pm on Nov 15, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Thanks, Graeme.

I've tried some of them and some only work in Terminal (I wanted a gui version but may have to change that).

Marked below as I recall them, but my memory may be wrong on the "not gui" ones (I tried them but no gui offered). Those not commented are new to me. Those marked "tried" do not do what I want. :(

* Checklinks written in Perl
* Dead link check written in Perl
* gURLChecker written in C (tried)
* KLinkStatus written in C++ (tried)
* link-checker written in C (not gui?)
* linklint written in Perl (not gui?)
* W3C Link Checker HTML interface only (not local)
* webcheck written in Python (not gui?)
* webgrep written in Perl
5:21 am on Nov 17, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



The ones without a GUI mostly output HTML, so once you have run it you examine the data in your browser.

It also means you can run them on your server, which is faster.

As a last resort you might be able to parse the HTML to get the data you want.
7:05 pm on Nov 17, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I think all of that applies to both gui and non-gui, Graeme. I'm looking for ways to avoid extra work. :)

My wife volunteered yesterday to go through the gUrl results. Which I've managed to lose... Ah, well.