homepage Welcome to WebmasterWorld Guest from 54.197.111.87
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Re-ranking following ODP changes?
ODP canonicalisation is progress.
g1smd




msg:3437550
 12:03 pm on Aug 31, 2007 (gmt 0)

The public ODP servers used to run through a squid cache system and that meant some basic domain canonicalisation could not be performed.

The directory was available at dmoz.com and dmoz.org and www.dmoz.com and www.dmoz.org as well as various mirrors.

A few years ago, Google listed a variety of pages from each one, but mostly favoured dmoz.org in listings.

The old Google "302 redirect hijacking problem" was most obviously seen when a site:dmoz.com search returned tens of millions of listings - none of which were from the ODP site. That problem listing was fixed within days (maybe by hand at Google?), but the underlying Google fault was not corrected for a very long time. A search for site:dmoz.com returned zero results at that time.

Nowadays, it (site:dmoz.com) shows a few tens of thousands of pages all of which are from the correct site.

As a part of various upgrades to ODP hardware in recent weeks, a set of redirects have been set up so that all directory pages are now served as www.dmoz.org URLs. This domain canonicalisation was put in place a few weeks ago.

This will mean that Google will have to reindex the whole directory and recalculate the PR score for the whole lot. It will also have an effect on the backinks list: both for links to the directory, and from the directory to listed sites.

The changes have been ongoing for at least a week, and it will likely take Google a few months to factor everything in.

Additionally, and as already noted in an earlier thread, the Google Directory is also being updated at the same time. That work started 2007-08-18, but only a few Google datacentres have the new version as yet.

I have no idea what happens when Google has to relcalculate the effect of 5 million links from one site that has now moved domain - although, for many pages of the directory, Google already "merged the PR" for www and non-www some years ago.

It is likely that there will be little to no effect from this, but it is interesting none the less. For one thing it will answer the question "how long does it take Google to reindex a site of half a million pages?"

site:dmoz.com = 45 000
site:www.dmoz.com = 31 000
site:dmoz.org = 750 000 (but includes many other subdomains too)
site:www.dmoz.org = already 180 000

 

Samizdata




msg:3437938
 5:59 pm on Aug 31, 2007 (gmt 0)

Gosh!

Oliver Henniges




msg:3438099
 8:04 pm on Aug 31, 2007 (gmt 0)

> recalculate the PR score for the whole lot.

I always thought dmoz and yahoo functioned as some sort of starting point for PR calculation. Dunno why I thought so;) Most sites I know hardly ever linked back to dmoz but nevertheless dmoz had a PR of 8 or 9 all the time, didn't it? If this PR-value isn't derived from backlinks (and I really believe that some spammy sites clearly outweigh dmoz in number and perhaps even quality of backlinks) it must have a special status of its own in the google algo. So I assume recalculation of this lot is something different from a general PR update.

I'm glad you dedicated a specific thread to this topic, because I really think theses changes are very important. Hope it doesn't get drowned/mixedup with the general regular whining on every month's SERP changes.

A friend of mine remembered me that dmoz had been completely down for some time due to technical reasons, and he said some data got completely lost. Is this true?

as I said in [webmasterworld.com...]
due to some restructuring of some dmoz-categories my backlink from there has gone from PR5 to PR1. According to the normal n-1 PR-inheritance-thumb-rule (which really worked perfect for ALL dmoz-categories I ever checked) I expect this value to recover after the next update, but you never know.

tedster




msg:3438110
 8:13 pm on Aug 31, 2007 (gmt 0)

...dmoz had a PR of 8 or 9 all the time, didn't it? If this PR-value isn't derived from backlinks...

In Yahoo SiteExplorer I see nearly 4 million backlinks for dmoz.org, with 2.3 to the homepage. That's sounds like enough to me. There was even a period of time where linking to your category page in dmoz was a very helpful action in some situations, especially if you "were" the category.

I agree that DMOZ was/is used as a kind of seed for certain functions. These changes will be worth keeping an eye on - thanks g1smd.

g1smd




msg:3438114
 8:18 pm on Aug 31, 2007 (gmt 0)

The takeaway point is that the baseline is no longer split between dmoz.org and www.dmoz.org with the non-www being favoured the most, but now everything stems from www.dmoz.org since some time just a week or two back.

So, you haven't just got your category structure reorganisation to consider, but also a shift to www for those categories. Add in the data for the ongoing Google Directory Update and there are many effects that could happen.

>> dmoz had been completely down for some time due to technical reasons <<

The editor side was completely down for several months at the end of 2006. The public site was still accessible and didn't go down for more than a few hours on a couple of occasions. Google carried on spidering and indexing that every day.

When the editor side came back up, one priority was to clear listings that had gone dead in the intervening months. The other was to get on and add more sites. Only then was the RDF generation process restarted to build the regular data dump files for downstream users (such as Google) to pick up and use.

g1smd




msg:3438939
 12:34 am on Sep 2, 2007 (gmt 0)

site:dmoz.org = 850 000 (but includes many other subdomains too)
site:www.dmoz.org = 200 000 and increasing

g1smd




msg:3440846
 3:10 pm on Sep 4, 2007 (gmt 0)

site:dmoz.org = 970 000 (but includes many other subdomains too)
site:www.dmoz.org = 210 000 and increasing

Samizdata




msg:3452490
 1:59 am on Sep 17, 2007 (gmt 0)

The way Google has been treating my 10-year-old ODP-listed always-considered-an-authority site over the past few weeks suggests that this is the most undervalued thread on WebmasterWorld and that g1smd was right on the money with one exception:

It is likely that there will be little to no effect from this

Perhaps "little to no long term effect" would be better, but he even had that covered:

it will likely take Google a few months to factor everything in

Meanwhile, some of us are having a thrilling ride on the roller-coaster.

steveb




msg:3458185
 7:37 pm on Sep 22, 2007 (gmt 0)

I have a hard time finding a cache for any dmoz page, including the main page.

And searches for text on even top level pages are coming up with nothing.
They are not indexed anymore, like
[dmoz.org...]
[dmoz.org...]

g1smd




msg:3458736
 7:38 pm on Sep 23, 2007 (gmt 0)

No. I suspect that for the www version of a lot of ODP pages, that many are newly indexed, and many others have yet to be indexed. Additionally, although many more may have already been spidered, it is possible that some of those are being whacked as Duplicates at the moment.

I think that it will take quite a while for their system to notice that the non-www version now issues a 301 redirect instead of serving content, and make the appropriate adjustments to the list of pages that they show in the SERPs. I suspect that it could take at least several months for things to be worked out.

steveb




msg:3458814
 10:38 pm on Sep 23, 2007 (gmt 0)

No, they aren't indexed. Not the old non-www, or the new www. Nothing.

Most obviously, the dmoz index page doesn't rank for a [dmoz] search because it is not indexed in any form.

While sites in dmoz should expect a longterm benefit from canonicalization, the deindexing of dmoz pages now certainly is hurting them now.

g1smd




msg:3458822
 10:54 pm on Sep 23, 2007 (gmt 0)

It's a simple canonicalisation issue.

There's yet another subdomain that was overlooked, and is not sending a 301 redirect.

That one has the root page and 105 000 other pages indexed.

It'll be fixed shortly.

g1smd




msg:3458852
 12:02 am on Sep 24, 2007 (gmt 0)

>> Nothing. <<

You're not looking hard enough.

It is a very simple Canonicalisation and Duplicate Content issue.

steveb




msg:3458902
 2:29 am on Sep 24, 2007 (gmt 0)

What are you talking about?

You think the favt that the index page isn't #1, or anywhere, for a search for [dmoz] is normal?

Obviously it is neither a simple canonicalisation or duplicate content issue.

There is hysterical gloating on another forum about it, but more important is the, at least short term, major repercusions of thousands of domains losing their best link.

gehrlekrona




msg:3459229
 2:12 pm on Sep 24, 2007 (gmt 0)

I thought I'd check out Dmoz this morning and searched for my site and to my surprise it wasn't even there anymore :( It's been there for eons, but not anymore.
What happened to it? I didn't think they deleted sites from Dmoz. Do they?

Genie




msg:3459386
 5:30 pm on Sep 24, 2007 (gmt 0)

Listings are deleted from Dmoz all the time, usually because they are dead or parked. They can also be removed because the content no longer fits the Dmoz guidelines.

However sites can also be moved from one category to another. If there is a delay in re-listing in the second category, a site could be unlisted for a while.

soapystar




msg:3459416
 6:08 pm on Sep 24, 2007 (gmt 0)

did the editor change in your cat?

g1smd




msg:3459442
 6:36 pm on Sep 24, 2007 (gmt 0)

It shouldn't take you very long to discover which URL the ODP Root Page, and all the Top Level Categories, are indexed under.

I'll also guess that it won't take Google's indexing system much more than a month to realise what is going on and fix the problem.

g1smd




msg:3460446
 4:41 pm on Sep 25, 2007 (gmt 0)

Heh, steveb have you located those 105 000 "missing" pages yet?

""There's yet another subdomain that was overlooked, and is not sending a 301 redirect.""

g1smd




msg:3467509
 11:30 pm on Oct 2, 2007 (gmt 0)

I see that the index at 72.14.203.nnn is completely different to most of the others.

For www.dmoz.org it has almost double the number of pages listed.

g1smd




msg:3471660
 9:56 am on Oct 8, 2007 (gmt 0)

Some Class-C blocks have almost double the number of pages for www.dmoz.org showing as compared to some other blocks. The number of non-canonical URLs is also slowly decreasing for most other URL formats.

The counts for www.dmoz.org have increased by several hundred thousand in the last few weeks in most blocks. There are still some very large differences in the counts, depending on where you look.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved