Welcome to WebmasterWorld Guest from 188.8.131.52
The public ODP servers used to run through a squid cache system and that meant some basic domain canonicalisation could not be performed.
The directory was available at dmoz.com and dmoz.org and www.dmoz.com and www.dmoz.org as well as various mirrors.
A few years ago, Google listed a variety of pages from each one, but mostly favoured dmoz.org in listings.
The old Google "302 redirect hijacking problem" was most obviously seen when a site:dmoz.com search returned tens of millions of listings - none of which were from the ODP site. That problem listing was fixed within days (maybe by hand at Google?), but the underlying Google fault was not corrected for a very long time. A search for site:dmoz.com returned zero results at that time.
Nowadays, it (site:dmoz.com) shows a few tens of thousands of pages all of which are from the correct site.
As a part of various upgrades to ODP hardware in recent weeks, a set of redirects have been set up so that all directory pages are now served as www.dmoz.org URLs. This domain canonicalisation was put in place a few weeks ago.
This will mean that Google will have to reindex the whole directory and recalculate the PR score for the whole lot. It will also have an effect on the backinks list: both for links to the directory, and from the directory to listed sites.
The changes have been ongoing for at least a week, and it will likely take Google a few months to factor everything in.
Additionally, and as already noted in an earlier thread, the Google Directory is also being updated at the same time. That work started 2007-08-18, but only a few Google datacentres have the new version as yet.
I have no idea what happens when Google has to relcalculate the effect of 5 million links from one site that has now moved domain - although, for many pages of the directory, Google already "merged the PR" for www and non-www some years ago.
It is likely that there will be little to no effect from this, but it is interesting none the less. For one thing it will answer the question "how long does it take Google to reindex a site of half a million pages?"
site:dmoz.com = 45 000
site:www.dmoz.com = 31 000
site:dmoz.org = 750 000 (but includes many other subdomains too)
site:www.dmoz.org = already 180 000
I always thought dmoz and yahoo functioned as some sort of starting point for PR calculation. Dunno why I thought so;) Most sites I know hardly ever linked back to dmoz but nevertheless dmoz had a PR of 8 or 9 all the time, didn't it? If this PR-value isn't derived from backlinks (and I really believe that some spammy sites clearly outweigh dmoz in number and perhaps even quality of backlinks) it must have a special status of its own in the google algo. So I assume recalculation of this lot is something different from a general PR update.
I'm glad you dedicated a specific thread to this topic, because I really think theses changes are very important. Hope it doesn't get drowned/mixedup with the general regular whining on every month's SERP changes.
A friend of mine remembered me that dmoz had been completely down for some time due to technical reasons, and he said some data got completely lost. Is this true?
as I said in [webmasterworld.com...]
due to some restructuring of some dmoz-categories my backlink from there has gone from PR5 to PR1. According to the normal n-1 PR-inheritance-thumb-rule (which really worked perfect for ALL dmoz-categories I ever checked) I expect this value to recover after the next update, but you never know.
...dmoz had a PR of 8 or 9 all the time, didn't it? If this PR-value isn't derived from backlinks...
In Yahoo SiteExplorer I see nearly 4 million backlinks for dmoz.org, with 2.3 to the homepage. That's sounds like enough to me. There was even a period of time where linking to your category page in dmoz was a very helpful action in some situations, especially if you "were" the category.
I agree that DMOZ was/is used as a kind of seed for certain functions. These changes will be worth keeping an eye on - thanks g1smd.
So, you haven't just got your category structure reorganisation to consider, but also a shift to www for those categories. Add in the data for the ongoing Google Directory Update and there are many effects that could happen.
>> dmoz had been completely down for some time due to technical reasons <<
The editor side was completely down for several months at the end of 2006. The public site was still accessible and didn't go down for more than a few hours on a couple of occasions. Google carried on spidering and indexing that every day.
When the editor side came back up, one priority was to clear listings that had gone dead in the intervening months. The other was to get on and add more sites. Only then was the RDF generation process restarted to build the regular data dump files for downstream users (such as Google) to pick up and use.
It is likely that there will be little to no effect from this
Perhaps "little to no long term effect" would be better, but he even had that covered:
it will likely take Google a few months to factor everything in
Meanwhile, some of us are having a thrilling ride on the roller-coaster.
I think that it will take quite a while for their system to notice that the non-www version now issues a 301 redirect instead of serving content, and make the appropriate adjustments to the list of pages that they show in the SERPs. I suspect that it could take at least several months for things to be worked out.
Most obviously, the dmoz index page doesn't rank for a [dmoz] search because it is not indexed in any form.
While sites in dmoz should expect a longterm benefit from canonicalization, the deindexing of dmoz pages now certainly is hurting them now.
You think the favt that the index page isn't #1, or anywhere, for a search for [dmoz] is normal?
Obviously it is neither a simple canonicalisation or duplicate content issue.
There is hysterical gloating on another forum about it, but more important is the, at least short term, major repercusions of thousands of domains losing their best link.
However sites can also be moved from one category to another. If there is a delay in re-listing in the second category, a site could be unlisted for a while.
I'll also guess that it won't take Google's indexing system much more than a month to realise what is going on and fix the problem.
The counts for www.dmoz.org have increased by several hundred thousand in the last few weeks in most blocks. There are still some very large differences in the counts, depending on where you look.