Forum Moderators: martinibuster
Since crawlers, well, crawl, wouldn't it be nice if they could downgrade sites based on the 404-age? After all, a site for which 60% of the links are 404's (or even 301's -- but let's just stick with 404's for now) has probably been abandoned for some time and is not a useful resource for surfers.
Now, a page that has dead links but also has useful information (like a tutorial written in 1997, and let's say the internal links are all functioning, it's just the outside resources linked to that have vanished) should still be relevant to a surfer.
The way to distinguish would be anchor text.
Let's say I'm looking for "Harriet's House of Hobby". Should the search engine return as the first result a 3 year old page that links to the old url, which 404's?
I'm not saying the SE has to keep a database of every 404. They can just keep a figure of how many 404's they get per page. So if Page A only contains the words "Harriet's House of Hobby" in anchor text, and Page A has a 404 rate of 70%, the SE should DISCOUNT that match accordingly. Meanwhile, if page B has the same string only in anchor text, but its 404 rate is 5%, it should look better. Of course, if a page has the same string in regular text (especially title/header) it's probably either the right page or has information about it, so that would show up even higher.
I've seen a lot of cases where websites moved from host to host, and the current website had much lower page rank than the dead ones. Google usually doesn't return a dead site, though, so instead you get lots of pages LINKING to the dead ones. Thus, the poor surfer ends up concluding that "Harriet" has folded up shop and moved on, when in fact the problem is the SE giving more weight to an established site with lots of inbound links -- that hasn't been updated in three years.
Anyway, does anyone see any problem with the little system I've outlayed? There are probably logical flaws that are evading me.
Are you sure they don't already? After all - if the crawler trys that outbound link it will read the 404 error.
It would be an interesting experiment to create a page of several outbound links and submit it to the SEs. Wait a few months, check SERPs, and then break the links and see what happens.