homepage Welcome to WebmasterWorld Guest from 54.166.255.168
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / The Macintosh Webmaster
Forum Library, Charter, Moderators: travelin cat

The Macintosh Webmaster Forum

    
Is there something like Xenu, for mac?
httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4398398 posted 9:27 pm on Dec 15, 2011 (gmt 0)

I'm looking for a decent link checker. Something that will crawl an entire site, and report back with URLs found, HTTP status, and whatnot.

something just like Xenu would be perfect. I need only a few options, like the ability to not follow external links, exclude files with regex.

suggestions?

 

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4398398 posted 9:28 pm on Dec 15, 2011 (gmt 0)

oh, there's one called Integrity;

the last time I tried that one it crashed the macbook. I'm not kidding

travelin cat

WebmasterWorld Administrator travelin_cat us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4398398 posted 9:41 pm on Dec 15, 2011 (gmt 0)

I have used Integrity to check a 10K+ page website with no problems, you may have run into a memory issue.

I've also used BLT and there is another called DeepTrawl

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4398398 posted 11:24 pm on Dec 15, 2011 (gmt 0)

A few years ago I was forced to install the w3c link checker locally because it was getting snarky about checking fragments in multiple files. I don't know how well you get along with Terminal and command-line input. It was horribly traumatic for me, but you only have to do it once.* The drawback is that by default the local version thinks your dtd is a link. (Someone told me how to override it, but my brain tends to shut down when I'm given command-line information.) The CGI part is separate.


* Except when they go and update it, leaving me no choice but to, uhm, ignore the update.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4398398 posted 11:38 am on Dec 19, 2011 (gmt 0)

Follow-up: OK, I gritted my teeth and read the manual. Well, the Help screen. I never knew it existed until I tried to use a command someone else told me about. Oops.

w3c link checker, installed locally, command-line interface. For a whole site:

checklink -X http://www.w3.org/TR/html4/loose.dtd -l http://www.example.com/ http://www.example.com/

-X = exclude. For html4/loose etc, substitute whatever your own DTD says. You can use a regex. Pile on further -X as needed. You have to include the DTD line because the local version doesn't ignore it the way the online version does, which means that every single page on your site will wait an extra 15 seconds before spitting out a 500 error.

-l (that's ell, not Eye) = constrain recursive searches to this location (here a whole domain)

The repetition of www.example.com is not a typo. The first one goes with -l. The second one is the actual page you're checking. By specifying a location you've made it infinitely recursive.

Once it has started, go out to dinner. Or leave on vacation, depending on how big your site is. The Link Checker is a Good Robot, so its default minimum time between requests is 1 second. You can make it longer but not shorter. My site took about 45 minutes. Oh, and don't forget to say

User-Agent: W3C-checklink
Disallow:

in your robots.txt. I did say it's a Good Robot ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / The Macintosh Webmaster
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved