homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

gURL - how to get at the information?

 9:58 pm on Nov 4, 2010 (gmt 0)

I've just got gURL scanning several web sites. Those where I'm only interested in local broken page links are no problem but a couple of sites are directories where I need to automatically remove external links if they have gone away.

I cannot find any way to export the links as (eg) a CSV or text file. Is this at all possible? Am I missing something?

Alternatively, is there an alternative link checker that would do the job? I would prefer a tailored link checker rather than a more generalised page "scraper".



 1:43 pm on Nov 5, 2010 (gmt 0)

I assume you are talking about gURLChecker [gurlchecker.labs.libre-entreprise.org...]

I believe I was in touch with the programmer (Emmanuel Saracco) once because I also needed some other features.

I will try to contact him. Let's see what we can do. GURLChecker is a great program.


 11:36 pm on Nov 5, 2010 (gmt 0)

You're correct about the name: I was working from memory. :(

I agree it's a useful program but it could be more so.

The fact that it creates caches of the visited pages has already proved useful: Clam found a number of viruses in the cached pages. I got rid of those very quickly. :)


 8:23 am on Nov 6, 2010 (gmt 0)


Please, use gurlchecker feature requests to ask for more, or project's lists to discuss about it ;-)

[labs.libre-entreprise.org ]




 11:01 pm on Nov 6, 2010 (gmt 0)

Thanks for joining in, esaracco.

Would there be any point in adding a new feature request along the lines I mentioned? The web site seems to be short on support for those features so far requested on it.

I notice the installation I have is 0.10.3 (Ubuntu Hardy) and there is no obvious way of updating it to 0.13. Clicking on the files download on the site merely offers a way to save the tar or extract it.


 11:26 pm on Nov 6, 2010 (gmt 0)

Hi dstiles,

0.10.3... Outch! Then you really should use current 0.13 release, for sure ;-)

There are packages for maverick and natty:


But gurlchecker is easy to build if you have devel packages and appropriate libraries versions.

Regarding gurlchecker config file, the better is to remove "~/.gurlchecker" directory before running the new version (but you will lose all projects and settings).

Good luck.



 10:18 pm on Nov 7, 2010 (gmt 0)

Hi, esaracco.

Thanks for the pointer. A link on your site to that page would be useful? :)

Not that the link really helped. It still supplies 0.10.3 for hardy and I am not sufficiently confident to over-write it with a different version.

What effect would that have? Both maverick and natty are, as far as I'm aware, betas: the upgrade OS version currently being offered me is karmic so neither would seem to be relevant.

As regards compiling: I used to be a software developer. The operative phrase here is "used to be". :)


 5:11 am on Nov 15, 2010 (gmt 0)

There are several alternatives:


I am not sure if any do what you want.


 11:04 pm on Nov 15, 2010 (gmt 0)

Thanks, Graeme.

I've tried some of them and some only work in Terminal (I wanted a gui version but may have to change that).

Marked below as I recall them, but my memory may be wrong on the "not gui" ones (I tried them but no gui offered). Those not commented are new to me. Those marked "tried" do not do what I want. :(

* Checklinks written in Perl
* Dead link check written in Perl
* gURLChecker written in C (tried)
* KLinkStatus written in C++ (tried)
* link-checker written in C (not gui?)
* linklint written in Perl (not gui?)
* W3C Link Checker HTML interface only (not local)
* webcheck written in Python (not gui?)
* webgrep written in Perl


 5:21 am on Nov 17, 2010 (gmt 0)

The ones without a GUI mostly output HTML, so once you have run it you examine the data in your browser.

It also means you can run them on your server, which is faster.

As a last resort you might be able to parse the HTML to get the data you want.


 7:05 pm on Nov 17, 2010 (gmt 0)

I think all of that applies to both gui and non-gui, Graeme. I'm looking for ways to avoid extra work. :)

My wife volunteered yesterday to go through the gUrl results. Which I've managed to lose... Ah, well.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved