homepage Welcome to WebmasterWorld Guest from 54.196.24.103
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / WebmasterWorld / The Macintosh Webmaster
Forum Library, Charter, Moderators: travelin cat

The Macintosh Webmaster Forum

    
Is there something like Xenu, for mac?
httpwebwitch




msg:4398400
 9:27 pm on Dec 15, 2011 (gmt 0)

I'm looking for a decent link checker. Something that will crawl an entire site, and report back with URLs found, HTTP status, and whatnot.

something just like Xenu would be perfect. I need only a few options, like the ability to not follow external links, exclude files with regex.

suggestions?

 

httpwebwitch




msg:4398402
 9:28 pm on Dec 15, 2011 (gmt 0)

oh, there's one called Integrity;

the last time I tried that one it crashed the macbook. I'm not kidding

travelin cat




msg:4398408
 9:41 pm on Dec 15, 2011 (gmt 0)

I have used Integrity to check a 10K+ page website with no problems, you may have run into a memory issue.

I've also used BLT and there is another called DeepTrawl

lucy24




msg:4398462
 11:24 pm on Dec 15, 2011 (gmt 0)

A few years ago I was forced to install the w3c link checker locally because it was getting snarky about checking fragments in multiple files. I don't know how well you get along with Terminal and command-line input. It was horribly traumatic for me, but you only have to do it once.* The drawback is that by default the local version thinks your dtd is a link. (Someone told me how to override it, but my brain tends to shut down when I'm given command-line information.) The CGI part is separate.


* Except when they go and update it, leaving me no choice but to, uhm, ignore the update.

lucy24




msg:4399350
 11:38 am on Dec 19, 2011 (gmt 0)

Follow-up: OK, I gritted my teeth and read the manual. Well, the Help screen. I never knew it existed until I tried to use a command someone else told me about. Oops.

w3c link checker, installed locally, command-line interface. For a whole site:

checklink -X http://www.w3.org/TR/html4/loose.dtd -l http://www.example.com/ http://www.example.com/

-X = exclude. For html4/loose etc, substitute whatever your own DTD says. You can use a regex. Pile on further -X as needed. You have to include the DTD line because the local version doesn't ignore it the way the online version does, which means that every single page on your site will wait an extra 15 seconds before spitting out a 500 error.

-l (that's ell, not Eye) = constrain recursive searches to this location (here a whole domain)

The repetition of www.example.com is not a typo. The first one goes with -l. The second one is the actual page you're checking. By specifying a location you've made it infinitely recursive.

Once it has started, go out to dinner. Or leave on vacation, depending on how big your site is. The Link Checker is a Good Robot, so its default minimum time between requests is 1 second. You can make it longer but not shorter. My site took about 45 minutes. Oh, and don't forget to say

User-Agent: W3C-checklink
Disallow:

in your robots.txt. I did say it's a Good Robot ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / The Macintosh Webmaster
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved