homepage Welcome to WebmasterWorld Guest from 54.204.79.235
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

    
gURL - how to get at the information?
dstiles




msg:4226821
 9:58 pm on Nov 4, 2010 (gmt 0)

I've just got gURL scanning several web sites. Those where I'm only interested in local broken page links are no problem but a couple of sites are directories where I need to automatically remove external links if they have gone away.

I cannot find any way to export the links as (eg) a CSV or text file. Is this at all possible? Am I missing something?

Alternatively, is there an alternative link checker that would do the job? I would prefer a tailored link checker rather than a more generalised page "scraper".

 

jabz




msg:4227081
 1:43 pm on Nov 5, 2010 (gmt 0)

I assume you are talking about gURLChecker [gurlchecker.labs.libre-entreprise.org...]

I believe I was in touch with the programmer (Emmanuel Saracco) once because I also needed some other features.

I will try to contact him. Let's see what we can do. GURLChecker is a great program.

dstiles




msg:4227355
 11:36 pm on Nov 5, 2010 (gmt 0)

You're correct about the name: I was working from memory. :(

I agree it's a useful program but it could be more so.

The fact that it creates caches of the visited pages has already proved useful: Clam found a number of viruses in the cached pages. I got rid of those very quickly. :)

esaracco




msg:4227458
 8:23 am on Nov 6, 2010 (gmt 0)

Hi!

Please, use gurlchecker feature requests to ask for more, or project's lists to discuss about it ;-)

[labs.libre-entreprise.org ]

Thanks!

Bye

dstiles




msg:4227619
 11:01 pm on Nov 6, 2010 (gmt 0)

Thanks for joining in, esaracco.

Would there be any point in adding a new feature request along the lines I mentioned? The web site seems to be short on support for those features so far requested on it.

I notice the installation I have is 0.10.3 (Ubuntu Hardy) and there is no obvious way of updating it to 0.13. Clicking on the files download on the site merely offers a way to save the tar or extract it.

esaracco




msg:4227622
 11:26 pm on Nov 6, 2010 (gmt 0)

Hi dstiles,

0.10.3... Outch! Then you really should use current 0.13 release, for sure ;-)

There are packages for maverick and natty:

[packages.ubuntu.com...]

But gurlchecker is easy to build if you have devel packages and appropriate libraries versions.

Regarding gurlchecker config file, the better is to remove "~/.gurlchecker" directory before running the new version (but you will lose all projects and settings).

Good luck.

Bye

dstiles




msg:4227853
 10:18 pm on Nov 7, 2010 (gmt 0)

Hi, esaracco.

Thanks for the pointer. A link on your site to that page would be useful? :)

Not that the link really helped. It still supplies 0.10.3 for hardy and I am not sufficiently confident to over-write it with a different version.

What effect would that have? Both maverick and natty are, as far as I'm aware, betas: the upgrade OS version currently being offered me is karmic so neither would seem to be relevant.

As regards compiling: I used to be a software developer. The operative phrase here is "used to be". :)

graeme_p




msg:4230372
 5:11 am on Nov 15, 2010 (gmt 0)

There are several alternatives:

[url:
[linkchecker.sourceforge.net...]

I am not sure if any do what you want.

dstiles




msg:4230774
 11:04 pm on Nov 15, 2010 (gmt 0)

Thanks, Graeme.

I've tried some of them and some only work in Terminal (I wanted a gui version but may have to change that).

Marked below as I recall them, but my memory may be wrong on the "not gui" ones (I tried them but no gui offered). Those not commented are new to me. Those marked "tried" do not do what I want. :(

* Checklinks written in Perl
* Dead link check written in Perl
* gURLChecker written in C (tried)
* KLinkStatus written in C++ (tried)
* link-checker written in C (not gui?)
* linklint written in Perl (not gui?)
* W3C Link Checker HTML interface only (not local)
* webcheck written in Python (not gui?)
* webgrep written in Perl

graeme_p




msg:4231295
 5:21 am on Nov 17, 2010 (gmt 0)

The ones without a GUI mostly output HTML, so once you have run it you examine the data in your browser.

It also means you can run them on your server, which is faster.

As a last resort you might be able to parse the HTML to get the data you want.

dstiles




msg:4231571
 7:05 pm on Nov 17, 2010 (gmt 0)

I think all of that applies to both gui and non-gui, Graeme. I'm looking for ways to avoid extra work. :)

My wife volunteered yesterday to go through the gUrl results. Which I've managed to lose... Ah, well.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved