Welcome to WebmasterWorld Guest from 54.196.86.89

Forum Moderators: Webwork & skibum

Outbound "Link Rot" - A Persistent Webmaster PITA in Need of Solutions

How do you minimize the time required to validate, update or delete outbound links?

     
1:49 pm on Apr 8, 2018 (gmt 0)

Moderator This Forum

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 2, 2003
posts:7984
votes: 64


Correct me if I'm wrong but is confronting / dealing with link rot - - links to sites that no longer exist or exist only as parked domains, links to sites that are entirely redirected to a new site, links to pages that have changed or been deleted, etc - - a matter of business due diligence that often goes like this: "If I don't look at it . . . . If I don't look to closely . . If I ~squint my eyes . . . . (maybe I won't see it or can ignore it - the rot - a little bit longer)"?

Building a decent directory has an element of fun: discovery, doing someone else some good (by linking to their site or article), etc. Maintaining a directory is a different matter, right? No? You love it all? Surely.

I've been going through the clean up process and . . I can't say it's entirely fun, but it's a must do.

Fixing link rot seems like a good candidate for outsourcing. But, outsource what? Editorial judgment? Hmm . .

One nice thing about WordPress is there's a plugin that, to my observation / experience, does a decent job of alerting me to issues of link rot . . but I still have to hunt down . . the missing links (so to speak).

So, my smallish contribution to the issue of fixing link rot is my endorsement of Broken Link Checker. [wordpress.org ] I suspect you can toss links from any non-WP site into a WP site and run BLC against the links -> fix 'em -> remove 'em and put 'em back where they came from.

The downside of BLC is it's said to be a bit of a memory hog if you let it keep running.

Any other suggestions or solutions? Tools? (Links to tools invited.)
2:08 am on Apr 5, 2018 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5777
votes: 105


The following 7 messages were spliced on to this thread from: https://www.webmasterworld.com/directories/4894572.htm [webmasterworld.com] by webwork

Then figure out ...

... how to deal with the curse of link rot.

[edited by: Webwork at 11:51 am (utc) on Apr 9, 2018]

3:44 am on Apr 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14710
votes: 614


the curse of link rot
Maintenance is everything. You can't just compile a directory and leave it for the ages, or even keep adding new content but disregard the old stuff.* When I find a bad link on {my directory of choice} and report it, I know it will be acted on within 24 hours. Human users of the directory may not formally know that reporting bad links is a thing--they may well think the whole thing was created invisibly by the Directory Fairies--but they know that if they find something listed, it probably exists.


* Hm. Come to think of it, that applies to most websites, doesn't it. Not just directories.
4:03 am on Apr 5, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11485
votes: 692


Even on non-directory pages, a "report broken links" utility is a good idea. I use it everywhere.
1:20 pm on Apr 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 5, 2001
posts:5844
votes: 97


link rot.

Ahha! another project to look forward to!
5:16 pm on Apr 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 5, 2001
posts:5844
votes: 97


So hunting for a little link rot I noticed that many had switched to https.

I'm updating all those, but wonder if G will see that as a new url/link?

And if so will that make a difference?
8:34 am on Apr 7, 2018 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7729
votes: 44


When I was involved with directories I wrote a spider to hit every page on Monday and save the server response in DB. If they returned a 404 it was hidden. I would give it a week to return before deleting the link. I think many "off the shelf" scripts had similar features.

Mack.
6:15 pm on Apr 7, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14710
votes: 614


If they returned a 404 it was hidden. I would give it a week to return before deleting the link.
Sure, it would be nice if every page 301-redirected to the current version of the content. And sure, it would be nice if everyone listed on a directory took the trouble to keep the directory informed of changes. But hunting down bad links and updating them manually is also part of the directory maintainer's job. At least if you think of the directory as a service for human users, not a service for websites.

:: belatedly wandering off to one particularly ancient page* with a flurry of external links that I really do need to check more often ::


* Back when I used the <address> element, it said “created in 1998”.
12:25 am on Apr 9, 2018 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1130
votes: 280


I remember your

Link rot . . yes . . I hear a communal sigh . .

from back in January.

I'm not a fan of third party tools because (1) they stop being maintained, (2) they disappear, (3) they go from free to reasonable to ridiculous pricing, (4) they become proprietary. And, unlike many webdevs, I'm able to write my own requirement specific software. That said, BLC, gets one to first base.

What such tools do not do well/at all is tell you when a link is still good but the landing page has changed.
* it is not unknown for a domain to be dropped and picked up by someone else with an eye on existing backlinks but a different business model than the site had under the prior owner. This is a common modus operandi of adult sites, malware distributors, etc. Having your family/business friendly site linking out to such a nastily changed page is not good for one's brand.

* it may (I haven't trialled and didn't see mentioned) not highlight a soft 404 (or even a hard 404). Linking out to a page that says 'not found' is, at the least, not an optimal UX to provide visitors.

Back in August 2016 goodroi started a thread Link Maintenance Matters [webmasterworld.com] that I participated in.

Out links:
Are my recommendations. I am saying 'if you like my stuff you'll enjoy this as well'. My reputation is riding on each and every one. So link rot is a serious matter. If gone the link must be removed asap and a replacement discovered if appropriate (and surrounding, anchor text adjusted as necessary). If resource has changed (1) for the worse, the link must also be immediately stayed and re-pointed or replaced , (2) for the same or better the anchor and surrounding text may need adjustment.

The easy if irritating time consuming side is removal/replacement. However, adjustment may be almost as important so that leaving my site and arriving on the other is a smooth transition. A bumpy landing isn't professional. I want return visitors. And that means being the best in niche from arrival to after leaving.

Unfortunately, I know of no third party tool that crawls, retrieves, stores, compares such information. And even with significant automation in the process the manual (as you say Editorial judgment? Hmm . .) time and effort for link management is non-trivial. Especially as link numbers rise: thousands, tens of thousands, hundreds of thousands, millions...

Data Reference:
BMC Bioinformatics, 2013, Volume 14, Number 14, Page 1.
Jason Hennessey, Steven Xijin Ge
A cross disciplinary study of link decay and the effectiveness of mitigation techniques

* the median lifespan of webpages published between 1996 and 2010 in the Thomson Reuters' Web of Science citation index is 9.3 years.

Data reference:
Legal Information Management, 14(2), 88-99.
J. Zittrain, K. Albert, L. Lessig, L. (2014).
Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations.

* 50% of the URLs within United States Supreme Court opinions do not link to the originally cited information.
* 70% of the URLs within the Harvard Law Review and other law journals published between 1999 and 2011 do not link to the originally cited information.

Given that such webpages and hyperlinks might be considered important to critical it is probable that more mundane pages and links change and disappear at an even greater rate than that law journal average annual rate of (70% / 12 = 5.8%. For guidance:
* 1,000 links * 5.8% = 58 rotten links per year.
* 10,000 links * 5.8% = 580 rotten links per year. 3 per 2-days.
* 100,000 links * 5.8% = 5,800 rotten links per year. 16 per day.
* 1,000,000 links * 5.8% = 58,000 rotten links per year. 159 per day.

Link rot is a serious, even critical, site maintenance activity that frequently goes unnoticed or is often ignored. On large sites or a large group of smaller sites it can be a full time job. Along with all the other full time job hats that a webdev might be juggling.
Note: I do know that I often feel like Dr. Seuss' Bartholomew Cubbins...

Unfortunately, as you may have noticed, I have no easy simple solution, no magic tools to suggest. Sorry.
1:31 am on Apr 9, 2018 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5777
votes: 105


My link directory software has a built-in link checker which I run at least once a month. It catches some problems but not others, so the job takes more than one tool.

I set up reminders in Remember the Milk (a tool I like) to do periodic "link patrols", i.e. run a link checking tool. Fix the links in the report then do another run (ideally with a different tool) and see what turns up. Fix some more...

I figure that once I get the proportion of 200 OK links to over 99% I can leave the task alone until next month.

As a Mac user I like the tools "Integrity" or "Scrutiny" by Shiela Dixon / Peacock Media to check for broken links. I export the report to an Excel file, then as time permits I go through it line by line and update, fix or remove links as needed. That's a good task to do while watching a movie!

I don't always delete dud links right away, because sometimes good sites have temporary problems. I'll move them to a hidden directory to revisit in a couple of weeks.

It's more important for UX and probably also for SEO to fix links that are actually broken, but I always seem to start with updating the http --> https or www/non-www redirects first, just because they're easy. There's a never-ending supply of http --> https redirects these days.
1:41 am on Apr 9, 2018 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 9, 2001
posts:5777
votes: 105


Added: my biggest directory is a niche site with roughly 50,000 links. I don't keep records but the percentages that Iamlost quotes for link rot feel very plausible to me.
12:29 pm on Apr 9, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 27, 2001
posts:1161
votes: 6


Xenu is what I use occasionally against my sites (and back in the day when 'broken links' was an unexploited link-building tool).

Old tool but still works although unsure how it functions on massive numbers of links, tbh. I think the coder claims it can check 1 million+
2:59 pm on Apr 9, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14727
votes: 428


I agree with stever about Xenu. I think you can tick off the box to check for redirects and outlinks to pages that redirect should show up there. I'm pretty sure that'll catch 404s and 301s in your outlinks.
4:48 pm on Apr 9, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14710
votes: 614


* 50% of the URLs within United States Supreme Court opinions do not link to the originally cited information.
* 70% of the URLs within the Harvard Law Review and other law journals published between 1999 and 2011 do not link to the originally cited information.
That is indeed scary--and it again illustrates why entities like the Wayback Machine need to exist. Once it's there, it will remain there, and nobody can come back and decide “that’s not what I meant to say” or--less sinisterly but equally destructively--“nobody cares about this any more so let’s do some housecleaning”.

Besides, Wayback Machine has preserved an absolutely priceless duo of articles by {name suppressed}, who is now a Staff Linguist at Google. (I absolutely love this detail.) By the time I found the archived version, the original had already disappeared. That would have been a sad loss.
1:19 am on Apr 16, 2018 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7729
votes: 44


I think of "link rot" as a natural process that will occur over time. Keeping on top of dead links is an effective way of ensuring your pages are kept fresh. Little says stale content better than a page full of broken links. For small-scale sites with few outbound links, it can be a very simple process of just manually looking over the links from time to time. For sites with hundreds, or thousands of links it will be advisable to automate this process in some way.

Earlier in this thread, I touched upon how I achieved this by crawling my outbound links once a week. If a page was returning a 404 it would be hidden from view. I could then work out if the site was indeed gone, or just having issues. If I was able to confirm the site would not be coming back I would simply remove the link. Some directory scripts will certainly have this built in but the issue goes way beyond directories. It could be a big problem on a large scale content site. Any solution, in this case, would depend largely on how the site was built, static or dynamic? what language? and what (if any) CMS?

Link rot can also pose other issues. Expired domains can often be purchased with the intention of building a site but retaining existing linkage. It's not just about removing dead links, it's about making sure all links are original.

What can work (to an extent) is using a link tracking script. This means that you are linking to a file on your own site with an id string to redirect the user to the true location clickout.php?id=1234 when clicked the script queries a database and finds the true URL for link id 1234. it then redirects the user. This also means you can check the status of links over time.

There is no magic one size fits all solution, and every situation will have its unique differences. The key is to only link out to sites that you feel are not only worth it and of true benefit to your users, but have longevity.

Mack.