Forum Moderators: Robert Charlton & goodroi
The previous Google Directory update was so long ago, that it is almost forgotten. That update on 2006-02-19 included all ODP edits through 2006-01-29 at that time.
The Google Directory has slowly become more and more outdated since then. Now it is again updated.
.
.
This is mostly visible on European servers, and not so much on servers elsewhere.
Google has at least 44 datacentres. I guess that it may take days or weeks for them all to be updated.
It is possible that they are only testing this on a very limited number until they are sure things are OK.
.
These five datacentres have the new version:
[gfe-bu.google.com...]
[gfe-fg.google.com...]
[gfe-fk.google.com...]
[gfe-hu.google.com...]
[gfe-mu.google.com...]
There are about 10 datacentres without any directory copy. The rest still have the old 2005 version. A few datacentres recently went offline.
Incomplete datacentre list: gfe-ag, gfe-an, gfe-au, gfe-ar, gfe-bf, gfe-bp, gfe-bu, gfe-bx, gfe-cw, gfe-dc, gfe-ed, gfe-eh, gfe-ff, gfe-fg, gfe-fk, gfe-gv, gfe-he, gfe-hk, gfe-hs, gfe-hu, gfe-ik, gfe-in, gfe-jc, gfe-jp, gfe-kc, gfe-kr, gfe-lm, gfe-lo, gfe-mc, gfe-mu, gfe-nf, gfe-nz, gfe-od, gfe-po, gfe-pr, gfe-py, gfe-qb, gfe-rn, gfe-ro, gfe-td, gfe-tl, gfe-tw, gfe-ug, gfe-ui, gfe-va, gfe-wr, gfe-wx and gfe-yo.
I'm going to step out on a limb and say that Google uses the ODP Dump for quite a bit more than what is discussed at the public level.pageone - Even though Google have not been updating the Directory with the DMOZ rdf dump for some time, they have been downloading and using the rdf dump for something, so I suspect you may be right.
Once back online, several ODP tools were run to clean out most of the dead listings
I just ran a search at DMOZ for a fairly common word and was surprised to see how many returns on each page (I went 5 pages deep) were from the free hosting services, especially GeoCities. You could try this test to confirm it for your own area of interest.
Requiring a dedicated domain name is a simple enough criteria, and even though it's certainly no guarantee of quality, it does imply at least a minimal level of committment from a siteowner/webmaster, and thus may help cut down on the daily submission volume that they experience.
......................
But if you are of the view that DMOZ is outdated even possibly corrupted data in some areas or you are just a webmaster with a site that falls in an area of the directory that dmoz hasnt had an active editor working on for the last three years then you wont share any possible delight that google would want to use this data what so ever.
Im of the opinion that the google algo was designed and always will be designed to work without any human intervention - Once you get human involvement in how one part of an algorythm works thats when you are heading for trouble so i just cant see this accounting to anything, i think its more likely that its simply a case of the directory hasnt been updated in so long, nows a good a time as any!.
If anything, sites listed in other human edited directory sites should carry the same weight if any. ie Yahoo directory, business.com, joeant, etc, etc dmoz is no different, its just another directory?
Oh and one final point, if google wanted to use directory data in its algo then it would employ a team to build one thats the only way in my mind it could be 100% sure that the data would be spot on correct but the fact is that it doesnt need to and thats because no one uses a directory any longer due to the amount of data now held on the net every one uses search.
Time will tell anyway, but i think this talk of google updating the directory with the old dmoz data for some other purpose than simply just "updating its directory" will amount to a damp squib
Cheers
You're looking for some "obvious sign of spam".
My guess is that the reason they are so terribly behind and increasingly irrelevant is because they are overwhelmed with submissions. So step #1 may be to more seriously filter those submissions so the volume gets down to a more manageable level, and requiring a dedicated domain is one easy criteria. But as you said and I clearly said in my post, that is NO indication whatsoever of quality. It's just one simple filter.
Then, ODP could run their own algo on every submitted site by spidering at least part of its top level for obvious spam -- any that fail the crawl are removed from the submission queue. I would imagine that they have software engineers perfectly capable of developing this, but if not then it would be worth Google's effort to give them a hand, since by association Google is representing them in its own directory.
So by the time it gets to a human editor, there is at least some likelihood that it may have value. If it doesn't, then the human deletes it and moves on to the next one in line. And if that line does not extend from here to kingdom-come, then they may feel more inclined to keep up -- which makes ODP more relevant, and thus more useful to Google.
.....................
this thread has taken the normal turn into a discussion of DMoz relevancy. I'm not seeing much as it relates to Google Search anymore
[1] There is the strong suggestion in this thread from very knowledgeable people that the DMOZ listings may play a role in the Google algo; and
[2] Google is utilizing those listings in its own directory; and finally
[3] There is the observation by many that the DMOZ directory is/has been/may-continue-to-be outdated.
So, all I'm saying is that given the connection between DMOZ and Google, it may be to G's interest to make some effort to get ODP back on track again -- they've got the bucks, the brainpower and the reason to do so.
Otherwise, the impact of DMOZ listings on the algo are not going to be generally accurate, which may create some distortion to the resulting SERPs. Garbage in, garbage out.
...................................
Im sorry but i fail to see how google could gain any possible advantage from using outdated data supplied by dmoz.
If the view here is that google need to use human edited data rather than rely on an automated algo then their entire business model is now floored - and thats why i think this is a storm in a teacup!.
The whole point of an automated algo at google is so that the serps can "not be corrupted" in any way by human involvement and thats the way it should stay.
Once you start letting human intervention in any form (especially from non google staff) you are leaving the door open to corrupted serps.
In my view DMOZ should not gain any more weight than any other human edited web directory
Once you start letting human intervention in any form (especially from non google staff) you are leaving the door open to corrupted serps.
I'll have to differ on that one. There will always be a small percentage of corruption taking place at some level, it is a given, it is human nature.
But, the algos sure as hell can't replace the human factor as is clearly evident in the SERPs today. ;)
And, how do we know that the algo isn't corrupt at some level?
Once you start letting human intervention in any form (especially from non google staff) you are leaving the door open to corrupted serps.
Human intervention (or--if you prefer--"human influence") is one of the Google algorithm's fundamental principles. PageRank is a formula that uses "votes" cast by Webmasters to influence search results.
And, how do we know that the algo isn't corrupt at some level?
we dont know that the google algo isnt corrupt at some level ... but its unlikely it will be, to any great level.
BUT introducing a Third Party human involvement most certainly increases that risk by a considerable margin.
As for the current google serps over all they are still good and ahead of anyone else in this space by a mile so for google to take a risk by including an extra factor from a third party source that COULD damage their serps would be a bad move.
As ive already commented,DMOZ should not be viewed by google any differantly to ANY other human edited directory.
DMOZ is not Wiki, its not a source of precise correctly listed data covering the internet and its not the "TRUSTED" source that it perhaps may have been when the net was significantly smaller in size a number of years ago.
Whilst i appreciate that some here are a fan of the outdated dmoz concept, i just dont see it offering any advantage to search and could bring other problems to google that google currently doesnt have.
[edited by: RichTC at 2:28 pm (utc) on Aug. 23, 2007]
As ive already commented, DMOZ should not be viewed by google any differantly to ANY other human edited directory.
Yes it should. There is nothing that evens comes close to dMoz as far as breadth, depth and scope.
DMOZ is not Wiki, its not a source of precise correctly listed data covering the internet and its not the "TRUSTED" source that it perhaps may have been when the net was significantly smaller in size a number of years ago.
It doesn't matter. Until something else comes along that matches the scale of the ODP, it will always have its place on the Internet.
Whilst i appreciate that some here are a fan of the outdated dmoz concept, i just dont see it offering any advantage to search and could bring other problems to google that google currently doesnt have.
Google have been using the ODP data for years. Any problems associated with ODP data have most likely been addressed. Technology has come a long way and if you've read any of those 57 Google Patents I referenced earlier in the topic you'll see the correlation.
I'll take it you've not had much luck with dMoz?
Some of our clients have had no problems getting listed, some never got listed and its been hit and miss with others. Own experience of sites being listed or not has nothing to do with it.
Getting listed depends on the sector of the net the site falls under, if that cat has an editor or not OR if the editor thats listed for that cat does in fact still edit or has moved on having got their own site listed and is now just pushing a few applications as and when to retain editor status.
Yes it should. There is nothing that evens comes close to dMoz as far as breadth, depth and scope.
NO it shouldnt - Yahoo directory is better for a start but neither DMOZ or Yahoo can keep up with the size of the net, its just not possible hence - a human edited directory is yesterday's news, the net has advanced way, way beyond a directory - it cant possibly cope hence, it can only at best provide google with a sample set of data and im saying google can pull that from Yahoo, business.com, joeant and any other human directory anyway.
Google have been using the ODP data for years
Yes, and thats fine, In moderation - IE treating data from DMOZ the same as data from ANY other human edited directory - DMOZ is not a special case - This is the point im making.
BUT Moreover the thread and the posts here are starting a rumour that indicates that google is about to give more weight to the use of the outdated dmoz data in its algo and in return giving dmoz directory some extra status - That has to be wrong! - Im saying that this cant be a good move if true as the entire principle of google is based on building an algo that can not be corrupted in any way by human intervention and giving one sample set of data some extra status COULD create other problems for google - Thats why i just dont believe it - But if you have anything factual i would love to hear about it!
Finally, you are going to be passionate about the use of the directory data if you are working on the open directory project but many other webmasters not involved in dmoz just wont share the same enthusiasm as you
Cheers
Let's say for the sake of a very simple example that Google gives 1 point for every good quality natural one-way link that a site has pointing to it.
But because Google respects the age, scope and authority of DMOZ, they give 1.5 points for a listing there.
The impact of that extra .5 on the overall "point score" of the site is minimal, yet it would hold true that "being listed in ODP helps with search engine scoring".
So for anyone here who understands algorithm construction and suspects that Google may be using DMOZ data, is this simplistic example in the ballpark?
..................
Finally, you are going to be passionate about the use of the directory data if you are working on the open directory project but many other webmasters not involved in dmoz just wont share the same enthusiasm as you.
I don't do any work with the ODP. My Editor Applications have been declined over the years, I gave up. I'm looking at this from the same level as everyone else.
Keep in mind that the use of the ODP data doesn't stop at the listings the ODP have in their dataset. As I've stated earlier in the topic, there is much more going on with that data than we discuss at the public level. Stuff that we don't really know much about.
Here are 57 patents that you are welcome to read to see how Google "may" be using the ODP data...US Patent Collection for AN/Google: 57 Patents [patft.uspto.gov]
Enjoy!
P.S. The depth of the ODP dataset is unmatched.
<added> It looks like g1smd and I have been reading some of the same documents. :)