Forum Moderators: open

Message Too Old, No Replies

ODP Link Rot about 10%

Does anyone have the real numbers?

         

Go2

8:46 pm on Aug 7, 2002 (gmt 0)

10+ Year Member



According to thread:

[webmasterworld.com...]

ODP seems to have a link rot percentage of about 10%. Can anyone verify this number? What measures can be taken?

choster

9:46 pm on Aug 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Fascinating, because ODP has an internal link checker which runs every couple of months or so. On the other hand, it has been a long time since the last run, and sometimes a large number of errors can be thrown up due to a single extensively deeplinked resource which is down or reorganized (for instance, the US Department of the Interior, or CNN.com).

If you do find a bad url (or one that's been hijacked by an adult or spam site, or one that's moved, etc.) please use the "update URL" link on the top right of the category page at dmoz.org to alert an editor.

rafalk

12:29 am on Aug 8, 2002 (gmt 0)

10+ Year Member



That number is wildly inflated. The number of "dead" sites usually hovers slighly below 1% at any given period.

Naturally the number is the lowest right after our internal link-checking robot makes it's rounds once every couple of months.

mbauser2

1:07 am on Aug 8, 2002 (gmt 0)

10+ Year Member



On the other hand, it has been a long time since the last run

Huh? I have a Robozilla hit in my server logs from Saturday. Looks like the last hit of Robozilla run that began in late July.

Previous Robozilla attacks are in mid-May and late March. Without digging any further into my logs, I'd say they're running Robozilla every other month or so. That's probably better than most of the competition.

(I almost never run into a 404 at dmoz.org. What I do run into is domains with changed missions. If dmoz.org has a problem with linkrot, the problem isn't with 404 errors; the problem is with expired domains that gets snapped up so quickly the link-checker didn't get a chance to catch them. Yahoo's got the same problem. Blame the registrars: if they would pull domains from the zone file during the expiry grace period, link-checkers would have a chance to notice lapsed domains.)

skibum

2:40 am on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It would be cool if Robozilla ran monthly or if registrars would hold domains for a long enough period of time (say a month) with nothing there so that the link rot checkers would be sure to pick them up w/o having to run so often that it would flag to many sites that are only momentarily down.

rafalk

1:34 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



It would be cool if Robozilla ran monthly ...

The way Robozilla operates is that it makes two runs two weeks apart. The first run flags any "problem" domains. The second run re-checks the problem domains to see if any of them are back to normal. Only then are the sites flagged for regular editors.

When you figure this two week schedule in, plus the amount of time it takes for editors to clear all of the errors from the first run (around a month), it means that Robozilla can be run at most every 6 weeks.

bird

2:04 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note that for checking all listings monthly, robozilla would need to fetch roughly 30 pages per second, 24/7. I'm not sure if the ODP even has the hardware readily available to support such a crawling cycle.

hstyri

5:15 pm on Aug 9, 2002 (gmt 0)

10+ Year Member



The problem with the title of this thread is that there are several definitions of "link rot".

A redirect status is not the same as a rotten link. If the redirect is due to some browser checking scripts, or used to add a session ID to the URL, I wouldn't label the original link "rotten".

Sometimes web servers redirect URLs like [example.com...] to [example.com...] and though the latter URL is the correct, I'd suspect that the former is more stable as it usually will stay the same even if the webmaster change from index.html to for example default.asp or index.php - URLs that doesn't change are supposted to be a good thing rather than rotten. ;)

If real life fas pefect and every webmaster used the correct status code I'd probably have a slightly different opinion about redirects, but real life is far from perfect.

Rotten links may return a server status (usually 404, 410 or 500, and some may include 403 as well), a connection time-out or a DNS problem. When ODP looks for rotten links, it will retry any URL that appears to be rotten. That way temporary server/network problems will not be labeled as link rot.

victor

6:22 pm on Aug 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The last lot of Robzilla's I got on the cats I edit showed almost exact 10% of the links had gone bad since the last Robzilla'ing.

When I double-checked:

** Some of them were good....Robzilla hadn't been able to find them, but the site had popped back up by the time I checked it. Servers do die from time to time.

** A couple were in the unreviewed queue as "updated URL".

** A few had changed URLs without bothering to tell anyone....These I tracked down via other sources.

** The rest seemed to be offline for good....If the site was still untracable a few weeks after the Robzilla report, I chucked it out.

Sorry, but I don't have precise %s for these three possibilities.