Forum Moderators: open
Meanwhile, google seems happy enough to dole out a PR10 to its straight mirror [directory.google.com] of the ODP.
Is google relying on its duplicate content detection or is this a unique type of penalty for using ODP data?
I was considering developing a niche directory site based in part on ODP data, but not if this is the inevitable result.
Seems reasonable, doesn't it?
>niche directory
Before you start, you might read here [webmasterworld.com].
Seems reasonable, doesn't it?
Not if you add unique, industry-specific content and links to existing ODP data. As I see it, doing that would pass any duplicate content detection but if there's some inherent penalty (whether applied manually or automatically) in using ODP data, I'd like to know before it bites me in the ass.
I've been following the "link pages being removed?" thread, but what does that have to do with this issue? Sure, a directory is a huge set of link pages, but what I'm seeing for sites based on the ODP is beyond that, IMO. Those link pages are still showing caches and backlinks...for those most part, these directories are not.
I was considering developing a niche directory site based in part on ODP data, but not if this is the inevitable result.
If you are going to build a directory, then what does it matter if Google gives it a PR10 or PR0.
The purpose of directory is as a central resource of information, so unless you were hoping to sell listings, pagerank really should NOT matter.
just my 2pfennings worth.
Shak
You want to build a directory, but want people to find it via Google, whilst all you are really doing is taking some ODP data and customising it.
Why should Google send you traffic?
Why do you not go and promote it offline, or pay for the advertising on relevant search terms etc etc.
Like I said 1 of us has got it wrong (probably me), but I still dont get why you need Auntie Google and her magic coloured underwear.
Shak
I guess if you take some ODP data (which google don't own anyway) and customise it to a state which is beneficial to that niche market (in other words it's better than the DMOZ data on it's own for users of that niche market) then it's of value to those users and worth traffic.
TJ
I think TJ's got the idea, and yes I agree with you Shak that a simple mirror or partial mirror is just duplicate content and should be treated as such.
Oh, and "niche" can also be pronounced more like "nitch" but "neesh" is probably closer to the French etymological root. ;)
Probably what is happening with Google is that some ODP mirrors tweak their page layout enough to get past the "automatic duplicate text" filter, and some don't. Google, for instance, displays the ODP data quite differently than the ODP does: omits some information, shuffles other information around, etc. It would take a VERY sophisticated (i.e. protoplasm-based) algorithm to detect the likeness. Others simply scrape the ODP page and redisplay it, possibly with added headers or footers. Still others may be on banned domains, or simply don't have any inbound links to speak of.
There are two main ways in which ODP data appears to be used.
The first is to get the entire RDF dump and create a clone copy of the directory, sometimes adding stuff like Amazon or other affiliate links into the content. There are hundreds of sites like this, and Google is good at spotting them. Frankly they're a waste of time unless you can drive traffic to them (maybe you're an ISP).
Secondly is to use a *portion* of the directory to add content to your site. So, if you have a site about widgets, you can add some of the directory contents about widgets to your site.
The second approach seems to work well for a number of reasons. It's very easy to do.. really, no more than copying and pasting from dmoz.org and then adding the correct attribution at the bottom. If you like you can add or delete entries or change descriptions, it's really up to you. It actually adds value to a site. Many people interested in widgets may not venture into any directory other than Yahoo!, so you're providing a service. It makes link exchaning a lot easier.. because you already have the links in place, you can just email your link partner to say that you've *already* added them which gets a more positive response. And finally, the page is more likely to be picked up as content by Google.. I've got about 20 pages from the ODP on one of my sites and it pulls about 50% of my traffic.
I mentioned attribution earlier.. that's basically the "Open Directory" box that's displayed on the bottom of every page (have a look in the Google directory and it's there too). Using ODP data without attribution won't get you a Google ban, but will get you banned from the ODP. In extreme cases it will also get all your other sites banned from the ODP too.
In the meantime you can read this excellent thread started by msgraph, which will enlighten your knowledge about how search engines go about detecting duplicates :-Duplicates and the challenges search engines face [webmasterworld.com]
Happy reading :)