homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
How do you find out who you're linking to?
internetheaven




msg:4539563
 9:23 pm on Jan 26, 2013 (gmt 0)

I'm looking for some spider software to crawl our site and tell me all our outbound links. I've been trying to find one, but can only find ones that search for broken links ... which was helpful, but I want to check our outbounds too.

Any tips?

Thanks
Mike

 

Hoople




msg:4539594
 7:06 am on Jan 27, 2013 (gmt 0)

Xenu the link checker will give you a report of all links.

The list can be found in the report and sorted. So it's easy to chop out your domain and then be left with your external link list. The Link report is exportable as TAB separated too (no need to copy/paste from the HTML report.)

internetheaven




msg:4539639
 1:52 pm on Jan 27, 2013 (gmt 0)

Sorry, I'm on a Mac. No Xenu. I tried "Wine" to get it open but the setup for using Windows stuff on a Mac is horrendous.

phranque




msg:4539641
 1:57 pm on Jan 27, 2013 (gmt 0)

try screaming frog SEO

internetheaven




msg:4539677
 8:43 pm on Jan 27, 2013 (gmt 0)

100 for screaming frog.

Surely there's something useful out there for less than that or even free?

phranque




msg:4539685
 10:51 pm on Jan 27, 2013 (gmt 0)

screaming frog is free for the 1st 500 links.

internetheaven




msg:4539690
 12:11 am on Jan 28, 2013 (gmt 0)

I've got 50 sites, only half of them have less than 500 links. Most are 5 figures.

Thanks anyway
Mike

lucy24




msg:4539703
 2:09 am on Jan 28, 2013 (gmt 0)

Unhelpful answer #6:

If you had complete up-to-the-minute backups the way people are always telling you to do-- see assorted completely unrelated threads about hacking, server crashes and so on-- this would be a non-problem because all you'd have to do is feed the backup into one of those smart text editors that can spit out all occurrences of the string 'href = "http://([^"]+)'

Or, heck, just run the w3c link checker and save a copy of the results. Sounds as if you're not even asking whether the links are currently valid, just what they are in the first place.

buckworks




msg:4539715
 3:09 am on Jan 28, 2013 (gmt 0)

For use on a Mac, try "Integrity" or its paid brother "Scrutiny".

internetheaven




msg:4539722
 3:52 am on Jan 28, 2013 (gmt 0)

Thanks, but I've tried integrity. It never exports the files right. It's always packed with duplication and Urls are getting cut off.

Do you use it with no problems?

incrediBILL




msg:4539723
 3:52 am on Jan 28, 2013 (gmt 0)

Maybe I'm missing something but Google already crawls your sites and Google WMT's tell you when you have broken links.

buckworks




msg:4539727
 4:21 am on Jan 28, 2013 (gmt 0)

I have used Integrity with few problems. That's not quite the same as "no" problems but I find it very, very useful. I upgraded to Scrutiny just a few days ago.

buckworks




msg:4539728
 4:23 am on Jan 28, 2013 (gmt 0)

Bill, where does GWT about your links to other sites? What have I missed here?

lucy24




msg:4539730
 4:26 am on Jan 28, 2013 (gmt 0)

Google WMT's tell you when you have broken links

Do they tell you about broken outgoing links? I thought they only listed broken incoming links. The OP said "outbound links" and I think everyone is assuming that means links to other sites, not necessarily ones that belong to you.

incrediBILL




msg:4539731
 4:27 am on Jan 28, 2013 (gmt 0)

What have I missed here?


I use a redirect script just like WebmasterWorld does which is why Google WMTs does check OBLs for me.

Sometimes I forget that stuff I do by default gives me advantages others don't have :)

If you didn't know, the way Googlebot handles a redirect script it actually attributes the status of the destination page to the redirect page, so "\example.com\redirect.html?url=example2.com" gets assigned the actual status of example2.com, not whether or not redirect.html itself returned a 200 OK.

It's the basics of how 302 hijacking worked and artifacts of that bug still in the system.

I just take advantage of the bug and Google's WMT tells me which links are bad or not without ever having to scan my OBLs myself.

Of course the sites that return 200 OK are still suspect because they can change to domain parks or all sorts of other stuff as 200 OK is not always OK and Link Sleuth nor most other link checkers can determine any of that but Google's WMT's does a pretty good job and will tell you it's a soft-404 and some other stuff.

All you need to get all that cool link checking for free is a URL redirect script in PHP and the other upside is the redirect scripts track your outbound traffic.

Enjoy.

incrediBILL




msg:4540802
 2:00 am on Jan 31, 2013 (gmt 0)

BTW, I forgot to mention a very important fact.

Once you switch to using a redirect script it's hard to turn back. I had mine always blocked in robots.txt to stop crawlers from crawling through it to avoid 302 hijacking back in the day. However, if you unblock it in robots.txt or better yet remove the redirect script, suddenly your site will set off red flags for unnatural linking and you'll get penalized.

Once you go down this path you're kind of stuck there as the best you can ever do if you want to get rid of the redirect script is set all your links that ran through it to "rel=nofollow". Possibly you can slowly convert those links back to normal, but I know for a fact that any major transition will trigger an instant penalty.

I watched a competitor switch to raw links once and Google stomped him into oblivion for about 2 years. I had an accident once that triggered a similar penalty, but knowing what caused it, I was able to get out of it in 30 days.

Just thought I'd warn of the dangers because any time you switch linking schemes you could end up paying a price that may not be worth it just for using Google as a link checker.

mark_roach




msg:4541482
 4:36 pm on Feb 1, 2013 (gmt 0)

It might depend on how your redirect script is written as to whether your outbound links get reported in WMT.

My script works in a similar way to the one here at WebmasterWorld in that it returns a 200 OK and has a META REFRESH to re-direct to the external page.

I know I have broken outbound links but have never seen one appear in WMT. However that could be because the script directory is blocked in robots.txt, so Google shouldn't be requesting the script anyway.

A previous incarnation of the script would return a 302 to the external page and I could see how any broken links in that case may appear in WMT. In fact the reason I re-wrote the script was because WMT flagged my site up as having malware due to the content of one of my outbound links.

Aaron99




msg:4543292
 6:41 am on Feb 7, 2013 (gmt 0)

Still there are nothing any such tools if you don't like these 2 (Xenu and screamingfrog.I'm using Xenu and screamingfrog for outbound and broken links as well. With GWT you can figure out the inbound links not outbound.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved