On a recent audit of one of my websites I found that some of the backlinks pointing to my pages were not correct, or broken, thus resulting in a 404 error page displayed to anyone following those links, including the search engine spiders. Analyzing the traffic to those broken backlinks, I found that approximately 8% of my unique visitors were being turned away by 404 error pages!
Ok, so this does sound terrible and you should be asking why 8% of my traffic is coming in from broken links. The breaks are as follows:
Links with markup tags or garbage in them:
www.widget.com/link.html<br> => www.widget.com/link.html
www.widget.com/limk.html => www.widget.com/link.html
www.widget.com/new-links.html => www.widget.com/links.html
Some of these errors are caused by my own bad web authoring practice such as moving documents or entire folders. I should have put some redirects in straight away but for what ever reason, I did not.
The other two categories are something I cannot easily control. If somebody hand types out a URL from my site into their article or forum post then its broken to begin with. Some of my broken inbound links that are mistyped are in high traffic websites that send a lot of visitors my way.
Broken backlinks were found by scanning webserver logs for 404 responses. Referring source and the target URL were collated to establish which website was linking to a non-existent page.
After I had seen that I had a potential chunk of traffic that could be rescued I set out to try and reclaim those backlinks. The first approach I used was to email website owners or forum posters and ask them to correct their link. As you may know, this did work some times, but largely just resulted in no response at all.
The next step I used after I reclaimed one or two links via getting pages corrected was to use a few redirects using mod_rewrite or similar. This worked best for entire directories where I could just change the directory part of the incoming request to the correct directory allowing any documents in the directory or any subdirectories to just work.
The last step in my three way approach was to tackle those 100 or so mistyped links that generate 1-2 unique visitors per day. Stuffing 100 specific rewrite rules into my mod_rewrite config did not seem to be the ideal performing solution, I mean, should every request including okay ones be tested against all broken links before the correct page is served. This is a sort of "assume its broken" approach. My answer to this problem was to use the custom 404 page directive in my webserver to point to my usual custom 404 page, but now alter the custom 404 page so that it scans the inbound link against a database of known broken backlinks. If the inbound link is found in the database, then the user is redirected off to the correct working page.
A week later I checked the logs, no broken backlinks. Only the usual 404s you see from scanners looking for exploits.
When I say "rescue traffic", I really do mean that if an inbound link was intended to go somewhere in particular then it should be fixed to go to the proper place. A general 404 page that lets you search the website or provides a bunch of links does not rescue traffic in the same way.
Eight percent broken traffic is huge but thats what you get when you are careless about how you manage your content. How much traffic do you have that goes straight to a 404?