Forum Moderators: open
As someone who has tried to grow a classification it is a nontrivial task and almost impossible for one person to maintain (sorry IS impossible).
I would love to hear what others have been doing in this area.
Thanks for listening.
Dajuroka is surely right about the difficulty of building and maintaining such a beast. DDS and LOC both put out regular updates, and the ODP is always growing.
All of these schemes have on the order of 300,000-500,000 categories, and regular "subcategory structures" that can be plugged in as necessary for newly overgrown categories.
To a certain extent then, most of the "top-level" headings of ODP are not actually subjects so much as notional impressions of subjects. Shopping is Shopping and Regional is Regional but there is considerable fluidity between, say, Computers and Science, or Home and Society. It is almost an exercise in fancy-- if I gave you a word and asked you to group it with one of twelve others, which would it be? The "real" subjects are in the layers below, narrower and more concrete topics which people understand immediately-- religion, movies, drug companies, Canadian football.
This model is extremely flexible, important because the world does not stand still. Some libraries still group personal computing with UFOs and psychics because the Dewey Decimal System grouped them with novelties, anomalies, and miscellaneous. :-)
At the same time, it has some, uh, glaring inadequacies. Perceptions of what a thing is (ODP ostentatiously terms this "ontology") vary from person to person, from culture to culture, and from language to language. It can be vague ("Home"? "Society"? And where does "Health" end and "Science" begin?). Lastly, some of the placements are forced-- ODP editors will be the very first to pronounce their astonishment and disaste that Education was fixed as a subcategory of Reference in the directory's infancy.
Do we start to develop an open source classification based on the ODP or is that even now too fixed in its origins to cope with a major change? Do we take the basics from all the others and start to grow one?
Is ODP adequate or does the top level need readjusting? I am aware that the Library of Congress is looking at how it catagorises the web. The problem is that we are listing not just a document (or book) but commercial sites, blogs, audio, visual.
I like the ODP but I fear that the top level should be larger to make the tree a little more logical.
With Google using it now and thousands of other sites one wonders if it will just live on. Of course if it is too dynamic then information will be lost or impossible to find.
But using "Math" instead of "Mathematics", "Kids and Teen" rather than "Children and Teenagers" and the whole use of the regional context.
This will make it harder to standardise sites. Its a bit like the use of .com in its early days being almost identical to .us. Must be hard for editors who find great regional sites that aren't US based.
And other little things like "Health Aging" at the same level as "Health Senior Health".
"Recreation> Antiques> US Civil War" but no other civil wars have antiques of interest?
I understand how these things happen and I do not for one minute suggest that the USA does not dominate the URI on the web but to have a universal directory even for english speaking sites would seem to me to need some significant shifts.
Certainly on my very little site I will be trying not regionalise as the ODP has... God knows how but will try.)
Some of it's inevitable. Pop culture worldwide is (for better or for worse--well if you insist, for worse and for worser--USA-centric: Pop music, movies, TV). World/Hindi probably has a suitably India-centric Movies category, and World/French a surprisingly unchauvinistic but still France-centric Literature category.
Some of it's fortunate. A USA-centric Religion category is richer than any other nation's could conceivably be (even if you omit Christianity altogether.) American Literature courses still probably spend more time on French authors than the reverse. (But I have a book, an anthology of Middle English literature was compiled by a French professor.) "Classical Chinese" music remains culturally limited in a way that "Classical German" music doesn't. Sciences and Engineering aren't culture-specific (although the priorities and emphases are), but the U.S. is large enough to provide a broad knowledge base and a fairly representative sample of research, and English is the closest thing to a world engineering language.
Some of it's probably harmless: a sociological curiosity.
Some of it's fixable, if we can find editors with the complementary knowledge and interests we need. The goal from the beginning was to index the "sum of HUMAN knowledge." It hasn't been achieved yet.
But some of it is going to remain confusing. I have a hard time dealing with British educational sites -- the terms are too different: so are the semesters, probably. The Australian denominations confuse me: ours have to be more confusing to Aussies. Asian musical styles and modes are probably eternally beyond my comprehension. Japanese literature. Italian political parties ... so long as there's something here to confuse everybody, we're probably doing as well as we can.
The ODP-designed taxonomy really doesn't begin until the second level (which has been extensively debated and occasionally modified, and wasn't really frozen till a couple of years ago: we tend to compensate for the oddities of the first level by lots of @links.) I should also mention that the freeze of the ODP's first level means that our licensees can (and some do) create their own different first level layout.
In contrast, it's easy enough (and hardly surprising) that MineMSN is laid out like a newspaper's classified ads or a TV station's program categories -- It's definitely designed by marketroids around what kind of content Microsoft hopes to sell to customers. I'd call it "mammon-centered" or "couch-potato-targeted."
[The Dewey Decimal System has its own oddities. IIRC, "Religion and Philosophy" is divided into 8 Christianity categories, 1 Philosophy category, and 1 "all other religions" catchall. That's valid based on what Melville was actually seeing in his library at the time, although a Chinese or Indian library might well benefit from a different breakdown.] I edit in "Religion", and I think it interesting that Christianity, Islam, and Scientology are third-level ODP categories -- higher than any of the other major directories, although not as high as in the DDS, where Religion is a first-level category. Its size would justify top-level at the ODP also, for that matter.]
Comparing online directories, the ODP's layout is more like Yahoo's -- another geek-founded project. But the comparision of MammonM$N and the ODP might be more interesting if you looked at the second-level categories. You could probably create a fairly good correlation even between MM$N and the Geekdirs if you picked and chose from second-to-fourth-level categories.
Anyhow life would certainly be easier if the web could use one system. I am sure 70% of currently classified URLs would not be disputed ie Golf is Golf (unless you are shopping then maybe it is Shopping Golf.... damn there it goes again...)
Has anyone created a straight text file of the ODP Categories (ie .txt .doc) as the RDF is very complex and a bugger to download when you have a capped download system. never seems to stop!
Are there any other forums where these issues are being debated?