Forum Moderators: open
First off, I want to say this is NOT meant as a slam on DMOZ, and I hope this thread will not turn INOT a bashing of DMOZ...
I have a recent dump of the DMOZ data, and I have edited out only the areas of intrest to me (the whole Widgets category & subs)
Of the 10,000 or so URLS in this Directory, I would say a good 10% are 403 or 404, and another 5 % at least have moved.
From threads I have read about how "hard" it is to be listed in DMOZ, the editors (or those who I am presuming to be editors at DMOZ) are always saying how overworked they are, and how behind they are on scouting inclusions.
Again, this is NOT A SLAM. I am just curious how much (if any) of an editor's time is spent checking old links? What provisions are there for updating moved or 404 pages? Does DMOZ have a spider (at the very least) to check these sorts of things? I am just curious how it is handled.
I was quite surprised how many stale links where in Widgets. I would think that this sort of thing should be a priority at DMOZ. I have seen posts that say that DMOZ's relevence is waning becasue it takes so long to get new sites in. I was quite surprised to find so many BAD links in there, too, and think this- if not corrected- will diminish some of DMOZ's importance.
Or do I have this all wrong?
Thanks!
Dave
I edit mainly in slow-moving non-commercial categories. But I get 1% or 2% "Robzilla Reds" a month. I fix 'em (maybe when I try, the server it is working); change 'em (find where the new site is and update the URL) or bin 'em (they are playing both dead and hard to find).
That may involve moving them from actively listed to "unreviewed" (ie not publicly listed) while I try to find the successor site....So (just as an aside) the next time some one says "DMOZ has n billion unreviewed sites" ask them how many of them are "robzilla reds"
If my experience is typical, then 10% would mean a backlog of 4 or 5 months in checking the reds. But commercial categories may turn over far faster.
Those statistics are either VERY atypical, or off badly. There is an offcial spider that checks links and flags (among other things) 403 and 404 pages. It's run about every month or two. I'm looking at its reports right now: Only two of the first-level categories have more than 1% "reds" (that is, 403, 404, -1, -4, etc.) and those are under 1.4%. This IS typical.
Editors have automated tools that address other specific kinds of problems (redirects, etc.), and they run them "whenever they feel like it."
Editors can also recheck current listings WTFLI, but that is probably not a high-priority activity unless someone reports problems in a particular area. (If you don't mind mentioning it, which category are you looking at?)
*Which BTW got put in a category relating to where the company was based :s as oppossed to any relation to what the website actually offeres and with a description that simply says "Includes testimoials, a tariff and a company description." which narrows it down to about 95% of the web
So as you can imagine, I'm not too impressed with DMOZ AT ALL!
Anyone know if there is any way to make a formal complain/request to have it moved and updated?
[edit]Sorry, no-one had posted when i hit reply[/edit]
Thanks for the reply. Thanks for taking it the way I intended it!
I ran a spider on the dmoz dump, and that is where I got my numbers from. As you mentionred, I too thought these numnbers were REALLY high, and that is why I wanted to bring it all up. Also, I have never noticed a DMOZ spider on my sites... so that is why I asked!
Thanks!
Dave
Dealing with reds tend to be a higher priority. An editor is much more likely to be LARTed by a meta for leaving reds that have been dead for many months than about the cat having unreviewed sites.
>Anyone know if there is any way to make a formal complain/request to have it moved and updated?
Go to the cat, and click "Update URL". While you may consider this description suboptimal, it doesn't look like abuse to me. Only if it was grossly wrong, such as saying your site had chocolate chip cookie recipes when it really sells widgets, would I say a formal complaint is warranted.
a) what is the editor thinking
b) who's best interest do they really have at heart
Apologies for taking this thread somewhat off to a tangent and no offence to any editors out there, but you'll understand why its left me a little frustrated...
Don't think you can ever get an answer to that one. As all editors are volunteers, it up to each one whether they edit 24 hours a day or 5 minutes a month. Its also up to them how they spend their editing time..clearing unreviewed, fixing 404s, checking descriptions and suitability of existing sites, and so on.
Luck of the draw as to who (or what) you get editing the categories you are interested in.
*Which BTW got put in a category relating to where the company was based :s as oppossed to any relation to what the website actually offeres...
>As you mentioned, I too thought these numnbers were REALLY high, and that is why I wanted to bring it all up.
Wild speculation: a bug in your, um, spider? (or the ODP one, of course)
>Also, I have never noticed a DMOZ spider on my sites.
It's yclept "robozilla" -- big, green, scaly, grins a lot when it's mouth's not full...you couldn't have missed it.
>That is why I asked...
No problem. I'd be happy to check out a sample (say, pick some category (not Adult, please, although I might get another volunteer to do those) with under 100 links and over 10 bad links.)
I'll go clean up everything Robozilla spotted last time round, then you run your spider on the rest, and we look at the difference under a microscope?
I replied via sticky with some specifics.
When I ran my spider, it was on the whole category, over 10,000 links. Over 3000 reported bad, but there were some errors of overagressive reporting on my spider. But I would say that- easily- 1000 were bad.
In my sticky to you, I speculated why... this category does have a few sites that are up one day, not there the next.
> It's yclept "robozilla" -- big, green, scaly, grins a lot when it's mouth's not full...you couldn't have missed it
Nope, never seen... or rather NOTICED it. Possibly because it will probably only grab 3-4 pages from my site, right? That is way too far down into the noise for me to notice.
> I'd be happy to check out a sample...with under 100 links and over 10 bad links.
Check out something like Sailor Moon, or some of the other anime topics and categories in that area.
Just give that (or something like that, in that area) a quick look through and see what you see. That is where I got MOST of my bad reports from.
Thanks!
Dave
This could easily happen via editor actions done appropriately. As has been mentioned, a site can have a listing in both topical and regional. Consider the possibility that a site about a bricks and mortar business is submitted to a cat of mine where it is not the correct cat. Proper ODP policy is that I should forward it to unreviewed of another appropriate cat for that site. This site qualifies for a listing in both Regional and topical. I'm just likely to forward it to whichever cat is easiest for me to find. If I check out the site and see it is located in Podunk, Iowa then there is a good chance I'll just send it on to the Podunk Regional cat. Particularly in light of the fact while I would be 100% sure it qualified for listing there, I may not know enough about how the taxonomy is in the topical cats, and thus might end up forwarding it on to the wrong cat. Where it might sit for months in unreviewed before some editor gets around to it, and that editor then has to move it somewhere else again. Thus it makes sense for me to forward it to the topical cat.
I just ran an ODP editor tool to check on this. I can confirm hutcheson's numbers. Currently the total reds percentage of the whole ODP is .5%, and there isn't even one branch where the reds are as high as 2%. The ODP doesn't have a material problem with reds.
I am sorry, I did not want to start a big fight about this or anything.... I was just concerned about some of what I was seeing. Also, all I had ever heard talk about was people trying to get INTO DMOZ, and I was totally unaware of what DMOZ did to stay current. I am very glad that there are active measures to insure that the directory does keep these out.
I am thinking that part of my problem might have been the lack of updates for so long, and a lot of sites DID go bad in that time. That was 6 months, right?
In any case, I just went looking myself. The first category I checked looked fine. But the second category I grabbed at random:
Top: Arts: Animation: Anime: Titles: D: Digimon: Fan Pages
1) 404
2) an index listing, no page (probably not worthy of being included in DMOZ)
3) OK
4) OK
5) OK
6) Down for maintenance (has a Christmas message...)
7) OK
8) From Site: "March 15, 2002 Sorry to say, but [site name removed] is shutting down..."
9) No site at all.
10) OK
11) OK
12) OK
13) OK (but this site forces a software download)
14) Dead Site
15) OK (I guess)
16) OK
17) OK
18) "This site has been discontinued, but will remain open in case die-hard fans..."
19) OK
20) OK
21) Blank page that updates you to a page that claims to be the FUTURE home of...
22) OK
23) <site name> Has been shutdown.
24) OK
25) Last updated Updated 9/9/00
26) OK
That is half the links on the page at DMOZ, from top to bottom. On a page I picked at random. 10 bad (at least, I did not make any judgment as to content, but I think if DMOZ standards are as high as they claim, a LOT more of these SHOULD be dropped) 10 bad out of 26. That is almost 40% bad or not there...
I guess spiders help, but I would say that a human review needs to be done on many of these.... and they may not report 403/404 (as in the case of a page that says "I shut this page down on 2/3/2002...").
Perhaps DMOZ should consider some sort of bad link reporting by users? I do that on my site, and it is GREAT for finding those bad sites you never get around to checking until they are reported.
Oh, I should note... My businesses and my sites have little in anything to do with this topic. If you think I am whining to get my site placed higher in the Anime categories, you are dead wrong, I have NOTHING to do with this topic. (PM me, and I will give you my real URL if you care...) I am just trying to provide some constructive criticism- notice I also give you some recommendations!
dave
Thus, almost all of what you cite couldn't have been spotted by Robozilla. It would have taken a hand review of every site in the category. Ideally this would be done periodically. However, I don't know how actively edited this cat space is. Also, it is possible all these sites went bad rather recently, and the editor hasn't spotted it.
>I did not make any judgment as to content, but I think if DMOZ standards are as high as they claim, a LOT more of these SHOULD be dropped)...
Look at the category name: "Fan Pages". In cats like this, so long as the sites have actual, on topic content they are ordinarily listed. The nature of a fan pages cat is such standards to get listed are very low. I agree the state of this cat is less than optimal. However, I'd hardly say this is typical of the ODP like you suggested in the first post.
Considering how longs is taken me to get one of my clients listed
Listing tmie will depend on the specific branch to which you submitted, the time zone, the time of year, the construction of the site, the volume of submissions, and numerous other factors. I spent much of this afternoon reviewing sites (which I can scarcely do any more because my ODP time is occupied by answering or initiating communications); at least one site was listed within ten minutes from submission and at least one was first submitted in 2001.
Which BTW got put in a category relating to where the company was based :s as oppossed to
Businesses and organizations frequently received two listings, one reflecting the location of the business and another reflecting the type of business or merchandise offered. Some categories, such as in the Business branch, do not list sites with a local orientation at all due to volume. In other cases, such as online retailers, a regional listing may not make sense, since the business is online only. Your mileage may vary and be sure to read the submission guidelines carefully for the category and branch in question.
If you would like a description or the location of the site changed, use the "update url" link at the top of the category where it is listed. The procedure for escalation is the same as for original submissions; e-mail editors up the tree as may be called for, or contact a meta-editor if abuse is suspected.
Thanks for the reply. I agree- most of those bad ones WOULD be hard (if possible at all!) for a 'bot to peg.
And I think we all agree- inside of DMOZ and out- you all need a lot more help, editor-wise. If the backlog of additions were not so big, perhaps a bit more human "rechecking" could be done, and some of this weeded out.
I just made the original post because I was not sure how this was handled... robots, human review, info from users, etc. Now I know- thanks!
I know you do not run DMOZ, and probably have little so say about how the operation is run. But there must be some soret of feedback channel. I have one website that uses a "slice" of DMOZ in a couple specific subject areas- that is how I first saw this. I have an instant feedback button on my site so if a URL is bad/broken/no longer on subject, a user can send me a notice imediately.
I would be happy to sticky you a URL, if you desire, but I am sure you know what I mean. (Just like a "correct URL" or "Add a Link" link)That would at least give you a pool of sites that REALLY need to get checked out- a pool of sites that Robotzilla cannot find for technological reasons. The way I have mine set up, it takes 5-10 seconds to check each one, so it is fast (I get an e-mail with a link to the "bad" site.... and a link that will instantly suspend a site. If it has moved, I just plug in the new URL... If, a week later, the site is still bad, I X 'em. Quick and easy)
Anyway, I just think this might be something you could send up the ladder at DMOZ; possibly it could be of use!
Good Luck!
Dave
As a matter of general policy, the ODP puts more emphasis on maintaining what is listed in the cat rather than reviewing greens. For example, I'd be much more worried if some meta were to notice that a lot of the listed sites in one of my cat were dead than them noticing some greens sitting in unreviewed. Greens just are so common at the ODP you've gotta have a huge pile of them, with many quite old, before anyone considers it alarming. Also, for the ODP our "customers" are the end users, not the webmasters that submit sites. The users are going to be more dissatisfied if a lot of links are dead then the fact some sites that could be listed in the cat haven't.
>That would at least give you a pool of sites that REALLY need to get checked out- a pool of sites that Robotzilla cannot find for technological reasons.
If you have an automated way of doing this, then slap the results up on a web page somewhere and post the link at Resource Zone. Dead sites are a priority, and some editall might want to deal with these.
Another feature of Robozilla (I believe) is that it doesn't IMMEDIATELY mark the site as unavailable, but visits several times before it marks it as a red.. simply because sometime servers are slow, time out or network congestion is heavy from time to time, it doesn't mean the site is down.
But the bottom line is that there ARE some categories that need attention. What usually happens is that sooner or later a masochistic editor will wade in there and sort it out, Duke Nukem style :)