| This 86 message thread spans 3 pages: < < 86 ( 1 2  ) || |
|Google DIRECTORY Updated: 2007-08-18|
Updated on 2007-08-18 including edits through 2007-08-07
| 3:59 pm on Aug 18, 2007 (gmt 0)|
On 2007-08-18, the Google Directory was updated with new information taken from the most recent ODP RDF dump. This update includes all directory edits through 2007-08-07.
The previous Google Directory update was so long ago, that it is almost forgotten. That update on 2006-02-19 included all ODP edits through 2006-01-29 at that time.
The Google Directory has slowly become more and more outdated since then. Now it is again updated.
This is mostly visible on European servers, and not so much on servers elsewhere.
Google has at least 44 datacentres. I guess that it may take days or weeks for them all to be updated.
It is possible that they are only testing this on a very limited number until they are sure things are OK.
These five datacentres have the new version:
There are about 10 datacentres without any directory copy. The rest still have the old 2005 version. A few datacentres recently went offline.
Incomplete datacentre list: gfe-ag, gfe-an, gfe-au, gfe-ar, gfe-bf, gfe-bp, gfe-bu, gfe-bx, gfe-cw, gfe-dc, gfe-ed, gfe-eh, gfe-ff, gfe-fg, gfe-fk, gfe-gv, gfe-he, gfe-hk, gfe-hs, gfe-hu, gfe-ik, gfe-in, gfe-jc, gfe-jp, gfe-kc, gfe-kr, gfe-lm, gfe-lo, gfe-mc, gfe-mu, gfe-nf, gfe-nz, gfe-od, gfe-po, gfe-pr, gfe-py, gfe-qb, gfe-rn, gfe-ro, gfe-td, gfe-tl, gfe-tw, gfe-ug, gfe-ui, gfe-va, gfe-wr, gfe-wx and gfe-yo.
| 7:22 pm on Aug 22, 2007 (gmt 0)|
>> DMOZ was even out of operation for some months this year due to technology problems so Google are in effect updating with out of date data? <<
Once back online, several ODP tools were run to clean out most of the dead listings, so the dataset is quite healthy.
| 8:03 pm on Aug 22, 2007 (gmt 0)|
pageone - Even though Google have not been updating the Directory with the DMOZ rdf dump for some time, they have been downloading and using the rdf dump for something, so I suspect you may be right.
|I'm going to step out on a limb and say that Google uses the ODP Dump for quite a bit more than what is discussed at the public level. |
| 9:30 pm on Aug 22, 2007 (gmt 0)|
|Once back online, several ODP tools were run to clean out most of the dead listings |
One thing they could do to modernize their directory, cut down their work load, and help eliminate stale information is to only accept submissions from websites that have their own unique domain name.
I just ran a search at DMOZ for a fairly common word and was surprised to see how many returns on each page (I went 5 pages deep) were from the free hosting services, especially GeoCities. You could try this test to confirm it for your own area of interest.
Requiring a dedicated domain name is a simple enough criteria, and even though it's certainly no guarantee of quality, it does imply at least a minimal level of committment from a siteowner/webmaster, and thus may help cut down on the daily submission volume that they experience.
| 9:52 pm on Aug 22, 2007 (gmt 0)|
No way. There is no more spam submitted from sites on free hosts than from spammers who buy domains and fill them with junk.
The type of URL or hosting gives no clue as to the usefulness of the site. You're looking for some "obvious sign of spam".
It doesn't work like that.
| 10:30 pm on Aug 22, 2007 (gmt 0)|
Obviously if your pro dmoz your view is going to be that its useful data for google to pull from.
But if you are of the view that DMOZ is outdated even possibly corrupted data in some areas or you are just a webmaster with a site that falls in an area of the directory that dmoz hasnt had an active editor working on for the last three years then you wont share any possible delight that google would want to use this data what so ever.
Im of the opinion that the google algo was designed and always will be designed to work without any human intervention - Once you get human involvement in how one part of an algorythm works thats when you are heading for trouble so i just cant see this accounting to anything, i think its more likely that its simply a case of the directory hasnt been updated in so long, nows a good a time as any!.
If anything, sites listed in other human edited directory sites should carry the same weight if any. ie Yahoo directory, business.com, joeant, etc, etc dmoz is no different, its just another directory?
Oh and one final point, if google wanted to use directory data in its algo then it would employ a team to build one thats the only way in my mind it could be 100% sure that the data would be spot on correct but the fact is that it doesnt need to and thats because no one uses a directory any longer due to the amount of data now held on the net every one uses search.
Time will tell anyway, but i think this talk of google updating the directory with the old dmoz data for some other purpose than simply just "updating its directory" will amount to a damp squib
| 11:23 pm on Aug 22, 2007 (gmt 0)|
|You're looking for some "obvious sign of spam". |
No, I'm looking for some way for ODP to actually be genuinely useful again, and not an antiquated relic from Web 1.0. I believe that can happen, but clearly it's not happening with the status quo.
My guess is that the reason they are so terribly behind and increasingly irrelevant is because they are overwhelmed with submissions. So step #1 may be to more seriously filter those submissions so the volume gets down to a more manageable level, and requiring a dedicated domain is one easy criteria. But as you said and I clearly said in my post, that is NO indication whatsoever of quality. It's just one simple filter.
Then, ODP could run their own algo on every submitted site by spidering at least part of its top level for obvious spam -- any that fail the crawl are removed from the submission queue. I would imagine that they have software engineers perfectly capable of developing this, but if not then it would be worth Google's effort to give them a hand, since by association Google is representing them in its own directory.
So by the time it gets to a human editor, there is at least some likelihood that it may have value. If it doesn't, then the human deletes it and moves on to the next one in line. And if that line does not extend from here to kingdom-come, then they may feel more inclined to keep up -- which makes ODP more relevant, and thus more useful to Google.
| 1:15 am on Aug 23, 2007 (gmt 0)|
I hate to say it (maybe I don't) but this thread has taken the normal turn into a discussion of DMoz relevancy. I'm not seeing much as it relates to Google Search anymore.
Those battles are fought daily in Directories.
Just some food for thought there moderators…
| 1:16 am on Aug 23, 2007 (gmt 0)|
Reno - You are assuming that suggested sites are actually of importance to DMOZ. Many editors just ignore them and go looking for sites themselves.
| 1:31 am on Aug 23, 2007 (gmt 0)|
|this thread has taken the normal turn into a discussion of DMoz relevancy. I'm not seeing much as it relates to Google Search anymore |
My only reason for going off on a tangent is because...
 There is the strong suggestion in this thread from very knowledgeable people that the DMOZ listings may play a role in the Google algo; and
 Google is utilizing those listings in its own directory; and finally
 There is the observation by many that the DMOZ directory is/has been/may-continue-to-be outdated.
So, all I'm saying is that given the connection between DMOZ and Google, it may be to G's interest to make some effort to get ODP back on track again -- they've got the bucks, the brainpower and the reason to do so.
Otherwise, the impact of DMOZ listings on the algo are not going to be generally accurate, which may create some distortion to the resulting SERPs. Garbage in, garbage out.
| 3:27 am on Aug 23, 2007 (gmt 0)|
Google aren't silly. They do a lot of testing. I am sure they would have tested giving a DMOZ listing an extra weighting vs not giving it an extra weighting. They would have determined the effect of that on the global quality of the search results. If the results are generally better with the extra weight for a DMOZ listing, then they will use it --- that has nothing to do with how outdated DMOZ is or not or how many potentially good sites are not listed or not etc .... its all to do with the results of Google's testing and the impact that it might or might not have on the search results.
| 1:45 pm on Aug 23, 2007 (gmt 0)|
If thats the case then why dont google cross refer the data to see whats listed in yahoo directory, business.com, Joeant and any other human edited directory?
Im sorry but i fail to see how google could gain any possible advantage from using outdated data supplied by dmoz.
If the view here is that google need to use human edited data rather than rely on an automated algo then their entire business model is now floored - and thats why i think this is a storm in a teacup!.
The whole point of an automated algo at google is so that the serps can "not be corrupted" in any way by human involvement and thats the way it should stay.
Once you start letting human intervention in any form (especially from non google staff) you are leaving the door open to corrupted serps.
In my view DMOZ should not gain any more weight than any other human edited web directory
| 2:13 pm on Aug 23, 2007 (gmt 0)|
|Once you start letting human intervention in any form (especially from non google staff) you are leaving the door open to corrupted serps. |
I'll have to differ on that one. There will always be a small percentage of corruption taking place at some level, it is a given, it is human nature.
But, the algos sure as hell can't replace the human factor as is clearly evident in the SERPs today. ;)
And, how do we know that the algo isn't corrupt at some level?
| 2:20 pm on Aug 23, 2007 (gmt 0)|
|Once you start letting human intervention in any form (especially from non google staff) you are leaving the door open to corrupted serps. |
Human intervention (or--if you prefer--"human influence") is one of the Google algorithm's fundamental principles. PageRank is a formula that uses "votes" cast by Webmasters to influence search results.
| 2:27 pm on Aug 23, 2007 (gmt 0)|
|And, how do we know that the algo isn't corrupt at some level? |
we dont know that the google algo isnt corrupt at some level ... but its unlikely it will be, to any great level.
BUT introducing a Third Party human involvement most certainly increases that risk by a considerable margin.
As for the current google serps over all they are still good and ahead of anyone else in this space by a mile so for google to take a risk by including an extra factor from a third party source that COULD damage their serps would be a bad move.
As ive already commented,DMOZ should not be viewed by google any differantly to ANY other human edited directory.
DMOZ is not Wiki, its not a source of precise correctly listed data covering the internet and its not the "TRUSTED" source that it perhaps may have been when the net was significantly smaller in size a number of years ago.
Whilst i appreciate that some here are a fan of the outdated dmoz concept, i just dont see it offering any advantage to search and could bring other problems to google that google currently doesnt have.
[edited by: RichTC at 2:28 pm (utc) on Aug. 23, 2007]
| 2:51 pm on Aug 23, 2007 (gmt 0)|
RichTC, I'll take it you've not had much luck with dMoz?
|As ive already commented, DMOZ should not be viewed by google any differantly to ANY other human edited directory. |
Yes it should. There is nothing that evens comes close to dMoz as far as breadth, depth and scope.
|DMOZ is not Wiki, its not a source of precise correctly listed data covering the internet and its not the "TRUSTED" source that it perhaps may have been when the net was significantly smaller in size a number of years ago. |
It doesn't matter. Until something else comes along that matches the scale of the ODP, it will always have its place on the Internet.
|Whilst i appreciate that some here are a fan of the outdated dmoz concept, i just dont see it offering any advantage to search and could bring other problems to google that google currently doesnt have. |
Google have been using the ODP data for years. Any problems associated with ODP data have most likely been addressed. Technology has come a long way and if you've read any of those 57 Google Patents I referenced earlier in the topic you'll see the correlation.
| 4:07 pm on Aug 23, 2007 (gmt 0)|
|I'll take it you've not had much luck with dMoz? |
Some of our clients have had no problems getting listed, some never got listed and its been hit and miss with others. Own experience of sites being listed or not has nothing to do with it.
Getting listed depends on the sector of the net the site falls under, if that cat has an editor or not OR if the editor thats listed for that cat does in fact still edit or has moved on having got their own site listed and is now just pushing a few applications as and when to retain editor status.
|Yes it should. There is nothing that evens comes close to dMoz as far as breadth, depth and scope. |
NO it shouldnt - Yahoo directory is better for a start but neither DMOZ or Yahoo can keep up with the size of the net, its just not possible hence - a human edited directory is yesterday's news, the net has advanced way, way beyond a directory - it cant possibly cope hence, it can only at best provide google with a sample set of data and im saying google can pull that from Yahoo, business.com, joeant and any other human directory anyway.
|Google have been using the ODP data for years |
Yes, and thats fine, In moderation - IE treating data from DMOZ the same as data from ANY other human edited directory - DMOZ is not a special case - This is the point im making.
BUT Moreover the thread and the posts here are starting a rumour that indicates that google is about to give more weight to the use of the outdated dmoz data in its algo and in return giving dmoz directory some extra status - That has to be wrong! - Im saying that this cant be a good move if true as the entire principle of google is based on building an algo that can not be corrupted in any way by human intervention and giving one sample set of data some extra status COULD create other problems for google - Thats why i just dont believe it - But if you have anything factual i would love to hear about it!
Finally, you are going to be passionate about the use of the directory data if you are working on the open directory project but many other webmasters not involved in dmoz just wont share the same enthusiasm as you
| 4:40 pm on Aug 23, 2007 (gmt 0)|
I am not a software engineer and thus am trying to understand how Google might use a listing in DMOZ in regards to specific websites. So this is a question, not an observation!
Let's say for the sake of a very simple example that Google gives 1 point for every good quality natural one-way link that a site has pointing to it.
But because Google respects the age, scope and authority of DMOZ, they give 1.5 points for a listing there.
The impact of that extra .5 on the overall "point score" of the site is minimal, yet it would hold true that "being listed in ODP helps with search engine scoring".
So for anyone here who understands algorithm construction and suspects that Google may be using DMOZ data, is this simplistic example in the ballpark?
| 5:03 pm on Aug 23, 2007 (gmt 0)|
I'm almost certain that GoogleGuy or another Google employee once said links from DMOZ don't receive any special weighting. (By that, I assume he or she meant "manual weighting" as opposed to whatever weighting DMOZ links might enjoy because of DMOZ's age and authority status from inbound links, etc.)
| 5:09 pm on Aug 23, 2007 (gmt 0)|
|Finally, you are going to be passionate about the use of the directory data if you are working on the open directory project but many other webmasters not involved in dmoz just wont share the same enthusiasm as you. |
I don't do any work with the ODP. My Editor Applications have been declined over the years, I gave up. I'm looking at this from the same level as everyone else.
Keep in mind that the use of the ODP data doesn't stop at the listings the ODP have in their dataset. As I've stated earlier in the topic, there is much more going on with that data than we discuss at the public level. Stuff that we don't really know much about.
| 7:10 pm on Aug 23, 2007 (gmt 0)|
|.. there is much more going on with that data than we discuss at the public level. Stuff that we don't really know much about... |
I have never seen any evidence of that at all.
| 7:22 pm on Aug 23, 2007 (gmt 0)|
Those 57 Google Patents will contain a lot of clues; some bigger than others.
| 7:23 pm on Aug 23, 2007 (gmt 0)|
No, there isn't any hard evidence. But, there is plenty of circumstantial evidence floating about here...
<added> It looks like g1smd and I have been reading some of the same documents. :)
| 12:47 am on Sep 1, 2007 (gmt 0)|
The directory update hasn't spread very far yet. Still not on very many servers.
| 12:24 am on Sep 7, 2007 (gmt 0)|
Now it is just about everywhere on their servers.
The Google Directory update is almost complete.
| 2:26 am on Sep 7, 2007 (gmt 0)|
|The Google Directory update is almost complete. |
What was that sound I just heard? Was that a bunch of noodp elements being trashed? ;)
| 2:09 pm on Sep 7, 2007 (gmt 0)|
Yes, update complete. A site of mine which has been in DMOZ for a very long time but never showed in the Google directory, is now showing. :)
Unfortunately it's the last site listed in its PR rank. :(
| This 86 message thread spans 3 pages: < < 86 ( 1 2  ) |