Forum Moderators: open
If Google are trying so hard to have their database as "fresh" as can possibly be, why would they still use DMOZ are their source for their directory. It has been well over 3 months since the ODP have updated ANYTHING, it's absolutly obsurd. As though the DMOZ wasn't already known for being a useless resource, this just makes it worse for them.
I can understand that they had problems when upgrading their servers etc.. thats fine, but holy god 3 Months? I personally could manage to completly redo anything that needs redoing in 3 months, and I'm just one guy, and probably not as good at IT as the guys they have, what on earth are they doing over there? Could they be on strike?
If the new hardware don't work, they should probably get a refund before that 12 month warranty expires, as it's fast approaching.
Am I wrong?
Sites do go offline all the time. The ODP does have a bot that checks for 404s. However, I believe they only do that about once a month. It also requires that a human editor actually check out the site to make sure it really is gone before it is deleted. The ODP definitely has a problem with an editor shortage. Personally, I'd consider the fact that the ODP has over a million sites in the unreviewed queue a greater problem. I am an ODP editor, and recently was approved for new cat space with over 4,000 listed sites, and over 500 in the unreviewed queue. Many of those unrevieweds are 6 months old, and I actually just found one green that had been in the queue since the year 2000.
It is this backlog of greens that makes the ODP more out of date than the fact it hasn't been 3 months since an RDF dump. Also, please note that for anyone who wants the most current listings, if the check dmoz.org that *does* have the sites added in the last 3 months showing. And, Google provides a clickable link to dmoz.org on every page of the Google directory. While the fact there hasn't been an RDF dump in 3 months is undesirable, I don't consider it that big of a disaster.
Here is what Netscape says:
6. Our Priorities are Our Data Users and the Community
We will be guided by the needs of our data users and the ODP editorial community. We will place their interests first in our priorities.
I am a data user...I'm pissed!
In the database, each and every category has a unique number (called a 'catid') assigned to it. However, due to a bug in the catmv system, some categories ended up with the same catid as others, and so the generation failed. This has been happening every week since the end of September, which is why the search database has been producing outdated results.Anyway, this has apparently been fixed (both the catmv bug and the duplicate catid bug), and a valid RDF dump should have arrived by the time you read this.
(From http://dmoz.org/newsletter/2002Christmas/news.html [dmoz.org])
[edited by: Laisha at 9:03 pm (utc) on Dec. 23, 2002]
[edit reason] Fixed URL [/edit]
And that affects who? The webmasters that check the little green bars next to the sites in their category.
It also affects how sites are listed on Google's SERPs. A site that's in the Google Directory will be listed with the ODP description of the site, not just the random Google "snippet." So, from a Webmaster's point of view, having an ODP migrate into the Google directory via the RDF is useful.
Data users are a "first priority" for the DMOZ model, but have no voice...other that emailing Autumn Looijen or Rich Skrenta, and getting an "as soon as possible" answer.
The feelings after months of no feedback or notices or solutions are something like the Postal Workers "Going Postal", perhaps it is "Going Portal!"
I, for one, have moved on to other search solutions and less dependence on DMOZ (and no attribution). But, it took me two months of very nervous waiting to make the move.
I'm sure many hundreds of smaller Data Users have moved to Mirroring and Search & Replace software, to replace the Dump (for static pages), and are creating search databases of the actual sites (without attribution).
That's my "Rant".
Perhaps the editor community does not understand the frustration of the Data User community
I'd say that while that's probably true overall, it's irrelevant. There is nothing the editor community can do either way.
For the most part, editors volunteer their time to edit. Period. Most of them volunteered with no idea that there was an RDF and many still don't know what that is.
Your frustration is quite likely with staff who actually works the backend of Dmoz, and there are only one or two of them who do that.
Or perhaps it's with AOL / Time Warner, who don't fund the project to the extent where it has the resources to do the basics like the RDF.
But it's most certainly not the editor community.
EVERY search engine that crawls does this!
It is ONLY the RDF Dump users that are required to give attribution. I would love to see DMOZ close it's server to crawling!
It is ONLY the RDF Dump users that are required to give attribution.
That's not entirely true. Like I said above if you present your data in a directory format, then attribution is required. For example there are programs that "scrape" the ODP and literally allow a webmaster to have a mirror-copy of the directory. Even though they're not using the RDF dump, the attribution is required.
When did we last see an apology from Google for their dance being late, or messed up? Or even a schedule in advance so we can plan a cosy night in dancing?
How many other major search engines (Teoma, Altavista etc) have indexes that are more recent than ODP's latest RDF? What dates will these guys next rebuild?
Is there any way with any major search engine that I can check in advance that I will be in their next index build? (Because I can check with ODP that I'm in the directory despite no RDF dump).
Okay, so 45,000 volunteer editors haven't hammered out Hamlet yet, but they are doing a good job -- better in some dimensions than the paid "competition".
Fact is, they are doing such a good job, that the paid competition uses them for base data.
They may not be as fast as some people who are trying to make money from the Internet want -- but that's not why they are volunteering their time.
We have an editor, dlugan, who developed a tool which he named "GooEdit" which is available for any editor to use. Google gave permission for their search results to be modified in order to cross-check against ODP listings. This has allowed increased efficiency in finding and removing junky, for sale, hijacked, 404 and "coming soon" sorts of sites in the ODP database. The tool wouldn't exist without Google's explicit permission to use their search results.
If I had paid for a search engine or directory listing, my expectation level might be one of updated information on a regular basis. For free listings maintained largely by a volunteer group, the expectation shouldn't be the same. You're not paying for services supplied, just benefiting from the opportunity of a free listing.