I'm looking for some spider software to crawl our site and tell me all our outbound links. I've been trying to find one, but can only find ones that search for broken links ... which was helpful, but I want to check our outbounds too.
Xenu the link checker will give you a report of all links.
The list can be found in the report and sorted. So it's easy to chop out your domain and then be left with your external link list. The Link report is exportable as TAB separated too (no need to copy/paste from the HTML report.)
If you had complete up-to-the-minute backups the way people are always telling you to do-- see assorted completely unrelated threads about hacking, server crashes and so on-- this would be a non-problem because all you'd have to do is feed the backup into one of those smart text editors that can spit out all occurrences of the string 'href = "http://([^"]+)'
Or, heck, just run the w3c link checker and save a copy of the results. Sounds as if you're not even asking whether the links are currently valid, just what they are in the first place.
Do they tell you about broken outgoing links? I thought they only listed broken incoming links. The OP said "outbound links" and I think everyone is assuming that means links to other sites, not necessarily ones that belong to you.
I use a redirect script just like WebmasterWorld does which is why Google WMTs does check OBLs for me.
Sometimes I forget that stuff I do by default gives me advantages others don't have :)
If you didn't know, the way Googlebot handles a redirect script it actually attributes the status of the destination page to the redirect page, so "\example.com\redirect.html?url=example2.com" gets assigned the actual status of example2.com, not whether or not redirect.html itself returned a 200 OK.
It's the basics of how 302 hijacking worked and artifacts of that bug still in the system.
I just take advantage of the bug and Google's WMT tells me which links are bad or not without ever having to scan my OBLs myself.
Of course the sites that return 200 OK are still suspect because they can change to domain parks or all sorts of other stuff as 200 OK is not always OK and Link Sleuth nor most other link checkers can determine any of that but Google's WMT's does a pretty good job and will tell you it's a soft-404 and some other stuff.
All you need to get all that cool link checking for free is a URL redirect script in PHP and the other upside is the redirect scripts track your outbound traffic.
Once you switch to using a redirect script it's hard to turn back. I had mine always blocked in robots.txt to stop crawlers from crawling through it to avoid 302 hijacking back in the day. However, if you unblock it in robots.txt or better yet remove the redirect script, suddenly your site will set off red flags for unnatural linking and you'll get penalized.
Once you go down this path you're kind of stuck there as the best you can ever do if you want to get rid of the redirect script is set all your links that ran through it to "rel=nofollow". Possibly you can slowly convert those links back to normal, but I know for a fact that any major transition will trigger an instant penalty.
I watched a competitor switch to raw links once and Google stomped him into oblivion for about 2 years. I had an accident once that triggered a similar penalty, but knowing what caused it, I was able to get out of it in 30 days.
Just thought I'd warn of the dangers because any time you switch linking schemes you could end up paying a price that may not be worth it just for using Google as a link checker.
It might depend on how your redirect script is written as to whether your outbound links get reported in WMT.
My script works in a similar way to the one here at WebmasterWorld in that it returns a 200 OK and has a META REFRESH to re-direct to the external page.
I know I have broken outbound links but have never seen one appear in WMT. However that could be because the script directory is blocked in robots.txt, so Google shouldn't be requesting the script anyway.
A previous incarnation of the script would return a 302 to the external page and I could see how any broken links in that case may appear in WMT. In fact the reason I re-wrote the script was because WMT flagged my site up as having malware due to the content of one of my outbound links.
Still there are nothing any such tools if you don't like these 2 (Xenu and screamingfrog.I'm using Xenu and screamingfrog for outbound and broken links as well. With GWT you can figure out the inbound links not outbound.