|Why Google Directory different than DMOZ?|
| 5:50 pm on Feb 7, 2005 (gmt 0)|
I'm noticing a difference in the number of sites reported in the Google Directory (GD) versus the ODP. Never noticed or looked before so I'm unsure if this is one of those things everyone knows...and I'm just stumbling on.
For example, [directory.google.com...] (71 sites showing)
The handful of other cats I checked with a Directory sub cat were all different with Google showing more sites. Is Google pulling sites from somewhere else or are they behind in updating?
| 5:51 pm on Feb 7, 2005 (gmt 0)|
Ya, google doesn't show sites in the directory that fall below a threshold pr value...
| 6:34 pm on Feb 7, 2005 (gmt 0)|
There is another issue.
DMOZ is in constant flux with re-organisation of categories and sites being added/deleted/moved, these changes appear in the DMOZ index after a few days but Google uses the directory dumps and does not have access to this 'live' database.
When new editors join DMOZ they often find a lot of sites to review and start by looking at the current sites, suggesting moves of the ones that should be elsewhere (or deleting them completely), as moving sites then go back into a list of unreviewed sites that possibly doesn't have an editor many sites can dissapear for a while/indefinately.
Google tend to update their directory when it pleases them (a bit like the SERPs) so it's unlikely that results will be consistent.
An effect of the re-organisation (primarily in Business) is that many categories go down to 0PR for up to an entire PR update cycle when they move. This is especially the case if an update to Google directory occurs shortly after a PR update.
I would not underestimate the effect that this reorganisation has, it can spread through other sites using the data too as their pages will be 'new', given that Google has decided to play around with link weights it will be interesting to see how a large update in say 'shopping' will effect SERPs. It could be a disaster as the junk sites that don't make it into DMOZ (some do) will benefit from any temporary loss of link power for the established sites that do well from DMOZ.
| 6:41 pm on Feb 7, 2005 (gmt 0)|
No, its simple. "Library and Information Science (5)" shows five sites, which dmoz doesn't count because they are actually in a different category tree [directory.google.com...]
Google however counts them as if they were a subcategory.
The discrepency means nothing.
| 7:46 pm on Feb 7, 2005 (gmt 0)|
Both Google and DMOZ list the library blogs seperately, they are not counted as the extra blogs I'm seeing in Google. And those extra sites listed in the Google Dir (and not DMOZ) are all PageRank 5 and better.
A quick comparison shows me four more sites in the Google Directory than DMOZ.
Backwash Blog Explosion Breaking Windows
The Octopus Files.
You can see them here:
I noticed in the past Google has a different breadcrumb trail, it seems they consolidate some cats. If you take a look at the blog directories listed under the by regio cat on both sites, you'll see what I mean.
But I don't see the four blog directories listed above at all. I'm asking this because from the three main categories I checked today (Computer, Recreation and Society) all three showed a difference in the Directories listed for certain topics - each showed more listings in Google than DMOZ, and yes I checked anciliary cats in DMOZ to see if they were included there. They weren't.
I'm wondering if anyone has noticed G adding to their Google Directory from other sources, particularly in the Directory area. Or is there a really easy explanation here I am missing!?
| 8:17 pm on Feb 7, 2005 (gmt 0)|
"Or is there a really easy explanation here I am missing!?"
In this case look at the DMOZ update dates at the bottom of the page. They were updated the 6th, yesterday. They were changed a lot.
The cache still shows the old page, which didn't have the By Regio subcategory:
| 12:18 am on Feb 8, 2005 (gmt 0)|
DMOZ updates their directory practically continuously. Editors add new sites, remove bad links, and reorganize categories. The changes show up in the public listings at least weekly, as far as I know.
Periodically, Google takes an "RDF dump" of the DMOZ data and uses it to build their directory. For the past couple of years they done this perhaps twice a year, so most of the time, Google's directory is at least somewhat out of date with respect to DMOZ. As far as I know, the only other significant difference between Google's directory and the DMOZ directory is that Google sorts listings by PageRank within a category, whereas DMOZ sorts them alphabetically.
| 3:19 am on Feb 8, 2005 (gmt 0)|
>A quick comparison shows me four more sites in the Google Directory than DMOZ.
These are all still listed in the ODP, just in different categories, as a result of recent re-review and re-categorization.
The Octopus Files.
>This is listed in the ODP, in the SAME category. Hint: different alphabetization rules.
Google is behind in updating, as always. (Even the public servers at dmoz.org are now liable to be several days behind the latest editing work.)
Google isn't actually drawing on any other sources for directory URLs, so far as I can see.
"Directory" subcats are always prime candidates for re-review and pruning: now, even more than ever, when pseudo-directories are the spam technique du jour, it is possible that extensive work in a directory category will result in fewer directory listings -- but more actual content site listings.
But your example doesn't exhibit that phenomenon.