Welcome to WebmasterWorld Guest from 3.92.92.168

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Software that determines whether Links return 404s

     
9:34 am on Nov 28, 2014 (gmt 0)

New User from GB 

5+ Year Member

joined:Oct 9, 2014
posts: 31
votes: 0


Is there software that will check all the links into a site, and determine whether these return a 404?

thanks.
4:20 pm on Nov 28, 2014 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


Yes, there are numerous tools like this. We had threads on tools and you can find references and descriptions on what tools do in these threads.

2013 Favourite SEO Tools [webmasterworld.com] - the first tool listed in this thread (Screaming Frog) will do what you want, but the free version crawls only the first 500 URLs.

Favourite SEO Tools [webmasterworld.com] - in this thread, the last tool that tedster mentions (AuditMyPc) will also do what you want, and the last time I looked (a few years back) it was free.

There could be other tools listed in these two threads that will do what you want, but the two mentioned I used myself in the past.
4:32 pm on Nov 28, 2014 (gmt 0)

New User from GB 

5+ Year Member

joined:Oct 9, 2014
posts: 31
votes: 0


Hi, I think I meant to say what software will get all the links, then get all the target URLs, and then check whether these target URLs are 404s or not.

I'm not sure ScreamingFrog does this.

I'm guessing I could get all the target links from Majestic and then run then through list mode in ScreamingFrog to check whether they are 404s. Though I was looking for something that it all steps in one process?

thanks
4:45 pm on Nov 28, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:13012
votes: 222


Are you saying you're trying to find which of your incoming links are 404ing? You can probably parse that out of your log file. But yes, you can also feed Screaming Frog a file full of URLs and it will check for 404s.
4:50 pm on Nov 28, 2014 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


Ah, understand now. As netmeg says, your server logs would be the best bet.

I presume you have already taken the list of URLs from WMT errors section? (although these will be in your server logs too)
8:00 pm on Nov 28, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:13012
votes: 222


I'm presuming he wants to find any incoming link that 404s, and not just the ones that people actually try to come in on.
8:43 pm on Nov 28, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


not just the ones that people actually try to come in on

But that goes beyond a tool you can run yourself, and into the realm of (paid) services. Which in turn gets into "a service is only as good as its robot". If GreatSite links to you, but they've blocked UsefulToolBot for whatever reason, then UsefulTool will never be able to tell you if GreatSite's links are valid.

In practice, though, the links people are actually using are the ones you really need to know about. That's where you open up your raw logs in a text editor and search using the appropriate Regular Expression such as (for Apache)
(GET|HEAD) \S+ HTTP/1\.[01]" 4(04|10) \d+ "http

This will also bring up any 404s whose referer happens to be your own site-- but that's just as well, since you would certainly want to know about those! I'd include both GET and HEAD, because a HEAD request may be someone else's link checker at work, and those are the ones most likely to respond to a "can you please fix this?" request.

As long as you're in there, you might throw 301 into the mix to pick up the valid-but-imperfect links.

[edited by: aakk9999 at 12:58 am (utc) on Nov 30, 2014]

10:17 pm on Nov 28, 2014 (gmt 0)

New User from GB 

5+ Year Member

joined:Oct 9, 2014
posts: 31
votes: 0


Hmm, looking through the logs files seems timeconsuming.

I was thinking of digging around for the odd link decent link, that gets minimal traffic and might be hard to find in log files.

Majestic and ScreamingFrog may be the best option.

thank you.
10:33 pm on Nov 28, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


looking through the log files seems timeconsuming

Not unless you've got so many sites, and they are all so heavily trafficked, that the mere act of doing a multi-file search will tie up your computer for hours. It took me about 30 seconds in TextWrangler (a few hundred, maybe 1-2000, small files, including zipped archives). I'm pretty sure the number of separate files is a much bigger factor than the size of individual files, so if you constrain it to just the past month or so, the time investment pretty well disappears.

That's assuming you keep the raw logs somewhere. You should, just as a matter of habit, even if you don't normally do anything with them.
10:36 pm on Nov 28, 2014 (gmt 0)

New User from GB 

5+ Year Member

joined:Oct 9, 2014
posts: 31
votes: 0


ok Lucy24 great thanks for this advice, I'll look at TextWrangler as well.
11:23 pm on Nov 28, 2014 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts:481
votes: 22


I have been using XENU for eons.
11:52 pm on Nov 28, 2014 (gmt 0)

Junior Member

5+ Year Member

joined:May 16, 2014
posts:141
votes: 0


I also use XENU, but it will only check links that are part of your site or link out from your site. To catch the odd inbound to a page that does not exist I'll usually look at logs.

Added: I also have a page set to no-index that only has one in content outbound link to my home page. I'll often redirect malformed inbound links to pages that do not exist to that no index page.
4:13 am on Nov 29, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15936
votes: 889


I'll often redirect malformed inbound links to pages that do not exist to that no index page.

You mean, just as a simple way of keeping track (because a 301 preserves the original referer)? Makes sense. But is there any way to distinguish between honest erroneous links, and spurious robotic requests that slap on a referer in hopes of getting past some barriers? I mean, other than excluding requests for /wp-admin/ and similar.
6:13 pm on Nov 29, 2014 (gmt 0)

Junior Member

5+ Year Member

joined:May 16, 2014
posts:141
votes: 0


The page was originally set up for bots. I don't have much of an issue with www vs non-www on this site, but a lot of bot traffic was coming in to example.com pages where canonical was www.example.com

I removed my htacess section setting canonical and redirected instead to that page with the one link to the canonical home page for any stray humans. I've found it helpful.
6:14 am on Nov 30, 2014 (gmt 0)

New User from GB 

5+ Year Member

joined:Oct 9, 2014
posts: 31
votes: 0


Ok using Majestic to find target URLs, Open Office to filter out the duplicates and then running the text file of the results through Screaming Frog works grand.
I do have to cross reference the 404s with the filtered results and the 'Source_URL' in Open Office.

It would be nice to have software that did it all in go step.