Forum Moderators: open
I heard a RUMOR from a DMOZ editor at PubCon Orlando (March) that DMOZ had finished their move to their new servers and that those new servers were located beside the Google servers in the DC datacenter.
Very recent DMOZ searches in categories that I am somewhat familiar with reveals many "missing sites." Sites that have been listed for years are gone. The missing sites seem to be primarily affiliate type sites, the affiliating agent site is usually listed if they are a brick and mortar or of significant size. This is kinda hazy, sorry, it's difficult to generalize the changes I'm seeing, except to say a lot of "questionable" sites are gone. The ones that remain are, for the most part, mainstream and/or well established.
All of this makes me suspect the relationship between Google and DMOZ has strengthened. Google supplies search to AOL, perhaps their relationship has been expanded so Google "sponsors" DMOZ. Perhaps in exchange, DMOZ guidelines have been refined or are more strictly enforced... Perhaps DMOZ quality has been improved and Google's post Florida algo relies more heavily on DMOZ as the basis for SERPs.
I admit, it's a lot of speculation. But could DMOZ revisions explain sites that dropped out of Google last fall? Some sites reappeared, perhaps those with enough links and optimization to override/outweigh a DMOZ benefit. We saw sites dropping out and some reappearing again early this year. The result of another DMOZ purge?
I admit, there is a very strong possibility that I've added 2+2 and arrived at a result of 5 :) but take a hard look at categories you are familiar with and see if they look significantly different to you than they did six months or a year ago. Travel related sites seemed to be hardest hit by Florida. Are the travel related DMOZ cats significantly different from 6 months or a year ago?
And don't be afraid to tell me if you think I "have too much time on my hands!" I may be crazy, but I'm also harmless. :)
Other than that, I don't buy the theory that DMOZ had anything to do with Florida. There were lots and lots of sites among those hit by Florida that had (and still have) an ODP listing.
If it is the case, it is pretty funny. Google's mission has always seemed to be as little human intervention as possible. If human review became a huge part of their ranking, it would certainly strike me as odd.
I guess nothing much changes for most either way. Submit your site to dmoz. If you get in great, if not, oh well.
I think this is the case. The only possible way that any Google money could influence the directory itself is if Google paid for some new ODP staffers to clean up cats. All the editors are volunteers, and wouldn't care what Google did. I am unaware of the ODP adding any new staffers to do the things you are suggesting. You could always ask this direct of the head of the ODP. I suspect the answer will be no.
Also, with Google moving the link to the directory off the home page, and no longer showing that cat a site is listed in in the SERPs, this doesn't point to a closer relationship. If anything, Google is moving away from the ODP.
As long as human review is free for them. It's a usual strategy to pretend that it is something for the community that turns later into a business for which contributors have worked for free.
There was a long discussion on this in the Directories forum over a year ago - in fact, more than one. It may be that some categories were reviewed and some listings found not to be suited.
As far as I understand, the ODP listing is concerned with what the site itself is about. The site, not the local business itself - if it is a local business. And if it's primarily what could be considered an affiliate site it's doubtful just listing a physical address would bypass whatever criteria they have to qualify for inclusion. That's how I understand it to be.
If the dead link checker flags a bunch of sites in a category, that can results in big changes to the category as well.
If Google didn't have massive category strucures like ODP, Yahoo!, etc.. to use to build a foundation, could they build a decent engine that wasn't totally warped by "text ads"? What would be the "reality check", or is something like that not even needed to build a decent index?
in the Directories forum over a year agoThen perhaps the time line does work, that would be well before Florida, about 9-10 months ago. Historically, Google only grabbed the RDF dump and updated the directory every few months...
Skibum, do you recall any "official effort" to encourage editors to "get tough" about editing guidelines 12-18 months ago? Sorry, I'm assuming you're permitted to discuss internal policy...
So perhaps my original question IS worth exploring... Let's make it:
Has anyone else seen a purge in cats they follow in the last 9-18 months?
I'm not in the travel biz, even peripherally, but I'm particularly interested in travel related cats, since travel was so hard hit by Florida. (I understand that it MAY simply be a geotargeting issue too.)
Last autumn Dmoz ugraded to new servers thus rendering the search engine functional again and making things easier for editors to do their job as well.
For those interested you can read about the update HERE [research.dmoz.org].
Here's the gist of it:
So what's the deal with Google? Well, technically we don't care since they are simply a downstream data user, but because the question comes up so often, we've included an answer for completeness.Google gets its directory by periodically downloading the RDF. It used to be once a month -- the current schedule is unclear. After they download it, they process it and eventually include information in search results. In between downloads of the RDF, Google spiders the live data at the ODP, picking up sites that have been added to include in search results. Because of this, there may be sites showing in a directory listing in the search results that aren't listed in Google's directory. And Google's directory may be months behind the public ODP directory. This is something over which the ODP has no control.
While I understand the frustration with not getting your site reviewed it is useful to keep in mind that there are far too few editors to handle the many requests for inclusion in the numerous categories.
I happen to have 3 categories in my travel related area yet with the sub categories having no editors the requests quickly pile up and it is impossible to keep up with things.
It takes a good deal of time to simply review ONE site properly and make sure it is not spam, affiliate or already listed in another category. Add to this the fact that it is volunteer work (like most of us, editors have regular jobs and can only dedicate spare time to Dmoz) and you can begin to appreciate the difficulties involved.
I prefer to consider it an additional source of information which can be used at the searcher's discretion.
One nice thing about Dmoz is that because it's a human edited directory you can pretty well count on finding pertinent sites and very little spam.
The ODP did move to new servers, and that was completed long before the end of 2003 I think.
The RDF dump had major errors in it for a while when all the ODP data was being converted to UTF-8. It took quite a while to do the conversion, and several months to iron out the few remaining minor bugs. For the last two months the RDF has been error free (almost - in just a few dumps during that time, a single or couple of invalid characters crept in sometimes) and all the scripts do UTF-8 error checking on imports of data and in edits of data.
It is possible that the Google search results were skewed because they could not spider the ODP when the servers were in trouble. Additionally for several months or more they may not have been able to use all of the data in the RDF dump during the times that it had many encoding errors (late 2003 and early 2004).
Just before and then during the server updates the editor interface worked so slowly that many people didn't edit much at all for several months. After the new servers went live, and while the "suggest a site" facility was still switched off, and while the public side was still halted from receiving updates from the editor side, the rate of editing was the highest that the directory had ever seen.
The biggest change you'll see in the listings is when Robozilla stomps through. In days of old suspect listings would be marked in red for the attention of an editor, but left on public view. Nowadays they are automatically pushed over to the unreviewed side and still highlighted in red for re-review. Robozilla ran through the whole site just a week or two back, and many thousands of URLs have been flagged up to be looked at again. They are all out of the public view. Maybe that is what caused the majority of observed changes?
The Google directory is currently just a few months old. Google updated their copy about 10 times in 2002, and only twice in 2003. In 2004 I think they have done 3 or 4 updates so far. When their techs realise that the whole 2.2GB of RDF dump is now 100% UTF-8, and 100% error-free, maybe they will come back a bit more regularly and grab an update.
However, Google isn't a major concern - personally I would like to see all those hundreds of ODP-clone sites currently using 2 or 3 or 4 year old ODP data now go back and get a more recent update too.