Forum Moderators: open
[webmasterworld.com...]
ODP seems to have a link rot percentage of about 10%. Can anyone verify this number? What measures can be taken?
If you do find a bad url (or one that's been hijacked by an adult or spam site, or one that's moved, etc.) please use the "update URL" link on the top right of the category page at dmoz.org to alert an editor.
On the other hand, it has been a long time since the last run
Huh? I have a Robozilla hit in my server logs from Saturday. Looks like the last hit of Robozilla run that began in late July.
Previous Robozilla attacks are in mid-May and late March. Without digging any further into my logs, I'd say they're running Robozilla every other month or so. That's probably better than most of the competition.
(I almost never run into a 404 at dmoz.org. What I do run into is domains with changed missions. If dmoz.org has a problem with linkrot, the problem isn't with 404 errors; the problem is with expired domains that gets snapped up so quickly the link-checker didn't get a chance to catch them. Yahoo's got the same problem. Blame the registrars: if they would pull domains from the zone file during the expiry grace period, link-checkers would have a chance to notice lapsed domains.)
It would be cool if Robozilla ran monthly ...
The way Robozilla operates is that it makes two runs two weeks apart. The first run flags any "problem" domains. The second run re-checks the problem domains to see if any of them are back to normal. Only then are the sites flagged for regular editors.
When you figure this two week schedule in, plus the amount of time it takes for editors to clear all of the errors from the first run (around a month), it means that Robozilla can be run at most every 6 weeks.
A redirect status is not the same as a rotten link. If the redirect is due to some browser checking scripts, or used to add a session ID to the URL, I wouldn't label the original link "rotten".
Sometimes web servers redirect URLs like [example.com...] to [example.com...] and though the latter URL is the correct, I'd suspect that the former is more stable as it usually will stay the same even if the webmaster change from index.html to for example default.asp or index.php - URLs that doesn't change are supposted to be a good thing rather than rotten. ;)
If real life fas pefect and every webmaster used the correct status code I'd probably have a slightly different opinion about redirects, but real life is far from perfect.
Rotten links may return a server status (usually 404, 410 or 500, and some may include 403 as well), a connection time-out or a DNS problem. When ODP looks for rotten links, it will retry any URL that appears to be rotten. That way temporary server/network problems will not be labeled as link rot.
When I double-checked:
** Some of them were good....Robzilla hadn't been able to find them, but the site had popped back up by the time I checked it. Servers do die from time to time.
** A couple were in the unreviewed queue as "updated URL".
** A few had changed URLs without bothering to tell anyone....These I tracked down via other sources.
** The rest seemed to be offline for good....If the site was still untracable a few weeks after the Robzilla report, I chucked it out.
Sorry, but I don't have precise %s for these three possibilities.