Forum Moderators: open

Message Too Old, No Replies

Google & The ODP

I think this is getting out of control

         

Chico_Loco

2:47 pm on Dec 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know this has been disgussed before but I think it deserves some more time.

If Google are trying so hard to have their database as "fresh" as can possibly be, why would they still use DMOZ are their source for their directory. It has been well over 3 months since the ODP have updated ANYTHING, it's absolutly obsurd. As though the DMOZ wasn't already known for being a useless resource, this just makes it worse for them.

I can understand that they had problems when upgrading their servers etc.. thats fine, but holy god 3 Months? I personally could manage to completly redo anything that needs redoing in 3 months, and I'm just one guy, and probably not as good at IT as the guys they have, what on earth are they doing over there? Could they be on strike?

If the new hardware don't work, they should probably get a refund before that 12 month warranty expires, as it's fast approaching.

Am I wrong?

EliteWeb

5:49 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why would they be? The new RDF hasnt been pushed because of bugs. Google still searches fine, their data is free, its downloadable and so forth. Err correction 3 days till xmas :D

rfgdxm1

5:52 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>No, it's much ado about an outdated directory.

And, do you have a substitute directory that isn't out of date, and is as good as the ODP, that Google can replace the ODP with as its directory?

rfgdxm1

6:05 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Yeah but go scan the Google Directory, I'm finding quite a few 404's

Sites do go offline all the time. The ODP does have a bot that checks for 404s. However, I believe they only do that about once a month. It also requires that a human editor actually check out the site to make sure it really is gone before it is deleted. The ODP definitely has a problem with an editor shortage. Personally, I'd consider the fact that the ODP has over a million sites in the unreviewed queue a greater problem. I am an ODP editor, and recently was approved for new cat space with over 4,000 listed sites, and over 500 in the unreviewed queue. Many of those unrevieweds are 6 months old, and I actually just found one green that had been in the queue since the year 2000.

It is this backlog of greens that makes the ODP more out of date than the fact it hasn't been 3 months since an RDF dump. Also, please note that for anyone who wants the most current listings, if the check dmoz.org that *does* have the sites added in the last 3 months showing. And, Google provides a clickable link to dmoz.org on every page of the Google directory. While the fact there hasn't been an RDF dump in 3 months is undesirable, I don't consider it that big of a disaster.

Dumpy

6:13 pm on Dec 22, 2002 (gmt 0)

10+ Year Member



"While the fact there hasn't been an RDF dump in 3 months is undesirable, I don't consider it that big of a disaster."

Here is what Netscape says:

6. Our Priorities are Our Data Users and the Community

We will be guided by the needs of our data users and the ODP editorial community. We will place their interests first in our priorities.

I am a data user...I'm pissed!

kctipton

7:24 pm on Dec 22, 2002 (gmt 0)

10+ Year Member



Dumpy, you're breaking my heart. staff@dmoz.org is where you should send your rants.

Dumpy

7:29 pm on Dec 22, 2002 (gmt 0)

10+ Year Member



I appreciate your response...May I quote you?

NFFC

7:33 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Polite moderator note:

Chill out guys, please.

rfgdxm1

7:37 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, moaning here about the delay in getting the RDF out won't make it happen any quicker.

rfgdxm1

7:40 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some possible good news. I just spotted in an internal ODP forum that they in fact *have* finished an RDF dump. However, it still needs to be checked for errors.

Slud

6:29 pm on Dec 23, 2002 (gmt 0)

10+ Year Member



A little note about the problems with the RDF dumps:

In the database, each and every category has a unique number (called a 'catid') assigned to it. However, due to a bug in the catmv system, some categories ended up with the same catid as others, and so the generation failed. This has been happening every week since the end of September, which is why the search database has been producing outdated results.

Anyway, this has apparently been fixed (both the catmv bug and the duplicate catid bug), and a valid RDF dump should have arrived by the time you read this.

(From http://dmoz.org/newsletter/2002Christmas/news.html [dmoz.org])

[edited by: Laisha at 9:03 pm (utc) on Dec. 23, 2002]
[edit reason] Fixed URL [/edit]

fathom

9:11 pm on Dec 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



EXCELLENT!;)- The new guidelines are more reflective of overall practices rather than just the norms.

Good work DMOZ!

rafalk

9:28 pm on Dec 23, 2002 (gmt 0)

10+ Year Member



We've had these new guidelines for a couple of months now. Glad you like em. :)

europeforvisitors

11:12 pm on Dec 23, 2002 (gmt 0)



The Contractor wrote:

And that affects who? The webmasters that check the little green bars next to the sites in their category.

It also affects how sites are listed on Google's SERPs. A site that's in the Google Directory will be listed with the ODP description of the site, not just the random Google "snippet." So, from a Webmaster's point of view, having an ODP migrate into the Google directory via the RDF is useful.

Dumpy

11:20 am on Dec 24, 2002 (gmt 0)

10+ Year Member



I awakened to a cold black early morning to parse the new RDF dump....NOTHING!

Does anyone know anything about it's status.

I read the newsletter!

rafalk

12:36 pm on Dec 24, 2002 (gmt 0)

10+ Year Member



You know just much as everybody else Dumpy. Keep checking dmoz.org/rdf . . .

Dumpy

1:53 pm on Dec 24, 2002 (gmt 0)

10+ Year Member



Perhaps the editor community does not understand the frustration of the Data User community. Hundreds of Data Users (RDF Dump users) have built their business models on parsing all or parts of the dump. Changes and improvements are usually implemented at each parsing (parsing a complete dump is usually about 48 hours).

Data users are a "first priority" for the DMOZ model, but have no voice...other that emailing Autumn Looijen or Rich Skrenta, and getting an "as soon as possible" answer.

The feelings after months of no feedback or notices or solutions are something like the Postal Workers "Going Postal", perhaps it is "Going Portal!"

I, for one, have moved on to other search solutions and less dependence on DMOZ (and no attribution). But, it took me two months of very nervous waiting to make the move.

I'm sure many hundreds of smaller Data Users have moved to Mirroring and Search & Replace software, to replace the Dump (for static pages), and are creating search databases of the actual sites (without attribution).

That's my "Rant".

The Contractor

2:00 pm on Dec 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dumpy,

Not sure what you want as you have been given all the information that is available to everyone. The last heard is that the RDF was successfully created and is now being error checked...... that's it.

Laisha

2:25 pm on Dec 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps the editor community does not understand the frustration of the Data User community

I'd say that while that's probably true overall, it's irrelevant. There is nothing the editor community can do either way.

For the most part, editors volunteer their time to edit. Period. Most of them volunteered with no idea that there was an RDF and many still don't know what that is.

Your frustration is quite likely with staff who actually works the backend of Dmoz, and there are only one or two of them who do that.

Or perhaps it's with AOL / Time Warner, who don't fund the project to the extent where it has the resources to do the basics like the RDF.

But it's most certainly not the editor community.

rafalk

2:25 pm on Dec 24, 2002 (gmt 0)

10+ Year Member



and less dependence on DMOZ (and no attribution).

I seriously hope you don't intend to mean that you use ODP data without the attribution requirement as required by the Free Use License.

Dumpy

3:11 pm on Dec 24, 2002 (gmt 0)

10+ Year Member



One can crawl DMOZ, and create a database of all the listed sites, excluding DMOZ and other extraneous urls. The resultant database uses the work of DMOZ but does not require attribution.

EVERY search engine that crawls does this!

It is ONLY the RDF Dump users that are required to give attribution. I would love to see DMOZ close it's server to crawling!

rafalk

3:41 pm on Dec 24, 2002 (gmt 0)

10+ Year Member



If the database is used to serve up SE type results, a la AOL or Google than attribution is not required. However crawling the ODP and then presenting the sites in a directory manner, requires attribution.

It is ONLY the RDF Dump users that are required to give attribution.

That's not entirely true. Like I said above if you present your data in a directory format, then attribution is required. For example there are programs that "scrape" the ODP and literally allow a webmaster to have a mirror-copy of the directory. Even though they're not using the RDF dump, the attribution is required.

victor

9:57 am on Dec 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The RDF is way behind and that's casuing some people some grief. ODP have been fairly open about the problems and we know they are working on it.

When did we last see an apology from Google for their dance being late, or messed up? Or even a schedule in advance so we can plan a cosy night in dancing?

How many other major search engines (Teoma, Altavista etc) have indexes that are more recent than ODP's latest RDF? What dates will these guys next rebuild?

Is there any way with any major search engine that I can check in advance that I will be in their next index build? (Because I can check with ODP that I'm in the directory despite no RDF dump).

Okay, so 45,000 volunteer editors haven't hammered out Hamlet yet, but they are doing a good job -- better in some dimensions than the paid "competition".

Fact is, they are doing such a good job, that the paid competition uses them for base data.

They may not be as fast as some people who are trying to make money from the Internet want -- but that's not why they are volunteering their time.

kctipton

6:25 pm on Dec 26, 2002 (gmt 0)

10+ Year Member



I haven't seen this mentioned anywhere, but I want to point out something new Google has allowed ODP editors to do.

We have an editor, dlugan, who developed a tool which he named "GooEdit" which is available for any editor to use. Google gave permission for their search results to be modified in order to cross-check against ODP listings. This has allowed increased efficiency in finding and removing junky, for sale, hijacked, 404 and "coming soon" sorts of sites in the ODP database. The tool wouldn't exist without Google's explicit permission to use their search results.

WebRookie

7:16 pm on Dec 26, 2002 (gmt 0)

10+ Year Member



Nicely said, victor.

If I had paid for a search engine or directory listing, my expectation level might be one of updated information on a regular basis. For free listings maintained largely by a volunteer group, the expectation shouldn't be the same. You're not paying for services supplied, just benefiting from the opportunity of a free listing.

rfgdxm1

7:46 pm on Dec 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In addition to what WebRookie wrote, there *is* an updated directory the public can use at dmoz.org. It is only the downstream users of the database that have outdated copies. Also, the ODP editors are more interested in the long haul. Eventually an RDF dump will be done, and the edits in the last 3 months will be available to all. Plus, for people using ODP mirrors like the Google directory, for the most part this isn't a disaster. Three month out of date ODP data is still quite useful. Sure, e-commerce sites added in the last 3 months are displeased that they can only be found by people using dmoz.org. However, the ODP doesn't exist just to please new e-commerce sites. The reality of the ODP is it is a slowly developing work. Personally, I'd think the fact there are unreviewed sites that have been in the queue for well over a year is a greater concern than this one time delay in the RDF dump.
This 55 message thread spans 2 pages: 55