Welcome to WebmasterWorld Guest from 54.234.114.182

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Re-ranking following ODP changes?

ODP canonicalisation is progress.

     
12:03 pm on Aug 31, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The public ODP servers used to run through a squid cache system and that meant some basic domain canonicalisation could not be performed.

The directory was available at dmoz.com and dmoz.org and www.dmoz.com and www.dmoz.org as well as various mirrors.

A few years ago, Google listed a variety of pages from each one, but mostly favoured dmoz.org in listings.

The old Google "302 redirect hijacking problem" was most obviously seen when a site:dmoz.com search returned tens of millions of listings - none of which were from the ODP site. That problem listing was fixed within days (maybe by hand at Google?), but the underlying Google fault was not corrected for a very long time. A search for site:dmoz.com returned zero results at that time.

Nowadays, it (site:dmoz.com) shows a few tens of thousands of pages all of which are from the correct site.

As a part of various upgrades to ODP hardware in recent weeks, a set of redirects have been set up so that all directory pages are now served as www.dmoz.org URLs. This domain canonicalisation was put in place a few weeks ago.

This will mean that Google will have to reindex the whole directory and recalculate the PR score for the whole lot. It will also have an effect on the backinks list: both for links to the directory, and from the directory to listed sites.

The changes have been ongoing for at least a week, and it will likely take Google a few months to factor everything in.

Additionally, and as already noted in an earlier thread, the Google Directory is also being updated at the same time. That work started 2007-08-18, but only a few Google datacentres have the new version as yet.

I have no idea what happens when Google has to relcalculate the effect of 5 million links from one site that has now moved domain - although, for many pages of the directory, Google already "merged the PR" for www and non-www some years ago.

It is likely that there will be little to no effect from this, but it is interesting none the less. For one thing it will answer the question "how long does it take Google to reindex a site of half a million pages?"

site:dmoz.com = 45 000
site:www.dmoz.com = 31 000
site:dmoz.org = 750 000 (but includes many other subdomains too)
site:www.dmoz.org = already 180 000

5:59 pm on Aug 31, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1312
votes: 0


Gosh!
8:04 pm on Aug 31, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2002
posts:872
votes: 0


> recalculate the PR score for the whole lot.

I always thought dmoz and yahoo functioned as some sort of starting point for PR calculation. Dunno why I thought so;) Most sites I know hardly ever linked back to dmoz but nevertheless dmoz had a PR of 8 or 9 all the time, didn't it? If this PR-value isn't derived from backlinks (and I really believe that some spammy sites clearly outweigh dmoz in number and perhaps even quality of backlinks) it must have a special status of its own in the google algo. So I assume recalculation of this lot is something different from a general PR update.

I'm glad you dedicated a specific thread to this topic, because I really think theses changes are very important. Hope it doesn't get drowned/mixedup with the general regular whining on every month's SERP changes.

A friend of mine remembered me that dmoz had been completely down for some time due to technical reasons, and he said some data got completely lost. Is this true?

as I said in [webmasterworld.com...]
due to some restructuring of some dmoz-categories my backlink from there has gone from PR5 to PR1. According to the normal n-1 PR-inheritance-thumb-rule (which really worked perfect for ALL dmoz-categories I ever checked) I expect this value to recover after the next update, but you never know.

8:13 pm on Aug 31, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


...dmoz had a PR of 8 or 9 all the time, didn't it? If this PR-value isn't derived from backlinks...

In Yahoo SiteExplorer I see nearly 4 million backlinks for dmoz.org, with 2.3 to the homepage. That's sounds like enough to me. There was even a period of time where linking to your category page in dmoz was a very helpful action in some situations, especially if you "were" the category.

I agree that DMOZ was/is used as a kind of seed for certain functions. These changes will be worth keeping an eye on - thanks g1smd.

8:18 pm on Aug 31, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The takeaway point is that the baseline is no longer split between dmoz.org and www.dmoz.org with the non-www being favoured the most, but now everything stems from www.dmoz.org since some time just a week or two back.

So, you haven't just got your category structure reorganisation to consider, but also a shift to www for those categories. Add in the data for the ongoing Google Directory Update and there are many effects that could happen.

>> dmoz had been completely down for some time due to technical reasons <<

The editor side was completely down for several months at the end of 2006. The public site was still accessible and didn't go down for more than a few hours on a couple of occasions. Google carried on spidering and indexing that every day.

When the editor side came back up, one priority was to clear listings that had gone dead in the intervening months. The other was to get on and add more sites. Only then was the RDF generation process restarted to build the regular data dump files for downstream users (such as Google) to pick up and use.

12:34 am on Sept 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


site:dmoz.org = 850 000 (but includes many other subdomains too)
site:www.dmoz.org = 200 000 and increasing
3:10 pm on Sept 4, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


site:dmoz.org = 970 000 (but includes many other subdomains too)
site:www.dmoz.org = 210 000 and increasing
1:59 am on Sept 17, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1312
votes: 0


The way Google has been treating my 10-year-old ODP-listed always-considered-an-authority site over the past few weeks suggests that this is the most undervalued thread on WebmasterWorld and that g1smd was right on the money with one exception:

It is likely that there will be little to no effect from this

Perhaps "little to no long term effect" would be better, but he even had that covered:

it will likely take Google a few months to factor everything in

Meanwhile, some of us are having a thrilling ride on the roller-coaster.

7:37 pm on Sept 22, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


I have a hard time finding a cache for any dmoz page, including the main page.

And searches for text on even top level pages are coming up with nothing.
They are not indexed anymore, like
[dmoz.org...]
[dmoz.org...]

7:38 pm on Sept 23, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


No. I suspect that for the www version of a lot of ODP pages, that many are newly indexed, and many others have yet to be indexed. Additionally, although many more may have already been spidered, it is possible that some of those are being whacked as Duplicates at the moment.

I think that it will take quite a while for their system to notice that the non-www version now issues a 301 redirect instead of serving content, and make the appropriate adjustments to the list of pages that they show in the SERPs. I suspect that it could take at least several months for things to be worked out.

10:38 pm on Sept 23, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


No, they aren't indexed. Not the old non-www, or the new www. Nothing.

Most obviously, the dmoz index page doesn't rank for a [dmoz] search because it is not indexed in any form.

While sites in dmoz should expect a longterm benefit from canonicalization, the deindexing of dmoz pages now certainly is hurting them now.

10:54 pm on Sept 23, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


It's a simple canonicalisation issue.

There's yet another subdomain that was overlooked, and is not sending a 301 redirect.

That one has the root page and 105 000 other pages indexed.

It'll be fixed shortly.

12:02 am on Sept 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


>> Nothing. <<

You're not looking hard enough.

It is a very simple Canonicalisation and Duplicate Content issue.

2:29 am on Sept 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


What are you talking about?

You think the favt that the index page isn't #1, or anywhere, for a search for [dmoz] is normal?

Obviously it is neither a simple canonicalisation or duplicate content issue.

There is hysterical gloating on another forum about it, but more important is the, at least short term, major repercusions of thousands of domains losing their best link.

2:12 pm on Sept 24, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 3, 2006
posts: 612
votes: 0


I thought I'd check out Dmoz this morning and searched for my site and to my surprise it wasn't even there anymore :( It's been there for eons, but not anymore.
What happened to it? I didn't think they deleted sites from Dmoz. Do they?
5:30 pm on Sept 24, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 15, 2004
posts:97
votes: 0


Listings are deleted from Dmoz all the time, usually because they are dead or parked. They can also be removed because the content no longer fits the Dmoz guidelines.

However sites can also be moved from one category to another. If there is a delay in re-listing in the second category, a site could be unlisted for a while.

6:08 pm on Sept 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 19, 2002
posts:1945
votes: 0


did the editor change in your cat?
6:36 pm on Sept 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


It shouldn't take you very long to discover which URL the ODP Root Page, and all the Top Level Categories, are indexed under.

I'll also guess that it won't take Google's indexing system much more than a month to realise what is going on and fix the problem.

4:41 pm on Sept 25, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Heh, steveb have you located those 105 000 "missing" pages yet?

""There's yet another subdomain that was overlooked, and is not sending a 301 redirect.""

11:30 pm on Oct 2, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I see that the index at 72.14.203.nnn is completely different to most of the others.

For www.dmoz.org it has almost double the number of pages listed.

9:56 am on Oct 8, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Some Class-C blocks have almost double the number of pages for www.dmoz.org showing as compared to some other blocks. The number of non-canonical URLs is also slowly decreasing for most other URL formats.

The counts for www.dmoz.org have increased by several hundred thousand in the last few weeks in most blocks. There are still some very large differences in the counts, depending on where you look.