Forum Moderators: phranque
And that's not the only thing that's broken at DMOZ. They're still stuck in the 1990s. They don't have a basic automation system to notify webmasters when they sites have been accepted or rejected. Which means that thousands (millions?) of webmasters are using that search function everyday to see if their sites are listed, finding that the search doesn't work, and then drilling down to the category to see if their sites are there. What a waste of resources/bandwidth! But DMOZ won't "fix" that either. Why? Because they don't have the resources. If they've ceased caring for their users, don't expect them to bother about webmasters wanting listings.
And don't expect the search to be fixed anytime this week.
Google has its API and people can develop own Google search application. I wonder if we can integrate Google API (search web directory instead of web pages) in our DMOZ sites. So, when users browse, he get DMOZ lists, when he search, he get Google Web Direcotry.
Google Web Directory is a few week(months?) lag behind of DMOZ. But basically, these two are identical.
Over the last 6 months all of the category names and paths, and all of the site URLS, titles, and descriptions have been converted. Many were in ISO-8859-1 but not all converted cleanly (there were at least half a dozen different types of hyphen in use in the data, for example). Many other categories were in all sorts of other encodings, and a small number had no default encoding defined at all.
At the same time as converting all of that over to UTF-8, the character checking routines were improved to not allow any non-valid data into the database from now onwards.
Over half a million "errors" had to be fixed up either by running scripts, or by hand-editing the entries. The last 4 RDF files were down to just one or two errors in each one.
Once there are no errors in the RDF I expect that search will improve, but there is other work needed to be done to it, and I don't believe that is one of the highest priority jobs for the staff programmers.
Use Google's search function for the moment. The ODP RDF has been produced weekly for well over a year now, so downstream users now always have a recent RDF to update their site from.