Forum Moderators: open

Message Too Old, No Replies

Use of ODP Data = Google Ban?

         

Dolemite

5:48 pm on May 24, 2003 (gmt 0)

10+ Year Member



Google doesn't appear to be very tolerant of sites that use ODP data in their web directories. Browsing through these sites [dmoz.org] reveals a lot of PR0's, grey bars, and pages that are neither cached nor have any backlinks.

Meanwhile, google seems happy enough to dole out a PR10 to its straight mirror [directory.google.com] of the ODP.

Is google relying on its duplicate content detection or is this a unique type of penalty for using ODP data?

I was considering developing a niche directory site based in part on ODP data, but not if this is the inevitable result.

rcjordan

6:12 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>duplicate content

Seems reasonable, doesn't it?

>niche directory

Before you start, you might read here [webmasterworld.com].

Dolemite

6:30 pm on May 24, 2003 (gmt 0)

10+ Year Member



Seems reasonable, doesn't it?

Not if you add unique, industry-specific content and links to existing ODP data. As I see it, doing that would pass any duplicate content detection but if there's some inherent penalty (whether applied manually or automatically) in using ODP data, I'd like to know before it bites me in the ass.

I've been following the "link pages being removed?" thread, but what does that have to do with this issue? Sure, a directory is a huge set of link pages, but what I'm seeing for sites based on the ODP is beyond that, IMO. Those link pages are still showing caches and backlinks...for those most part, these directories are not.

Shak

7:14 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was considering developing a niche directory site based in part on ODP data, but not if this is the inevitable result.

If you are going to build a directory, then what does it matter if Google gives it a PR10 or PR0.

The purpose of directory is as a central resource of information, so unless you were hoping to sell listings, pagerank really should NOT matter.

just my 2pfennings worth.

Shak

Dolemite

7:25 pm on May 24, 2003 (gmt 0)

10+ Year Member



Well what good is a source of information if noone can find it?

trillianjedi

7:26 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you are going to build a directory, then what does it matter if Google gives it a PR10 or PR0.

People still have to find it though, and being a PR0 with no chance of getting any higher sure won't help!

TJ

<Edit: Dolemite beat me to it!>

Shak

7:56 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I think 1 of us has got it all wrong.

You want to build a directory, but want people to find it via Google, whilst all you are really doing is taking some ODP data and customising it.

Why should Google send you traffic?

Why do you not go and promote it offline, or pay for the advertising on relevant search terms etc etc.

Like I said 1 of us has got it wrong (probably me), but I still dont get why you need Auntie Google and her magic coloured underwear.

Shak

Laisha

9:11 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My guess was that they did that to keep PR from passing from each of them.

Could be wrong, though.

OT:

niche directory

That's pronounced "neesh." :)

trillianjedi

10:35 pm on May 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good point Shak - I follow your line of thought now.

I guess if you take some ODP data (which google don't own anyway) and customise it to a state which is beneficial to that niche market (in other words it's better than the DMOZ data on it's own for users of that niche market) then it's of value to those users and worth traffic.

TJ

Dolemite

4:41 am on May 25, 2003 (gmt 0)

10+ Year Member



To me the key detail is that I intend to add to the ODP data with both additional content and more directory links. To me this justifies the site as a unique resource.

I think TJ's got the idea, and yes I agree with you Shak that a simple mirror or partial mirror is just duplicate content and should be treated as such.

Oh, and "niche" can also be pronounced more like "nitch" but "neesh" is probably closer to the French etymological root. ;)

multex

8:21 pm on May 26, 2003 (gmt 0)

10+ Year Member



>I agree with you Shak that a simple mirror or partial mirror is just duplicate content and should be treated as such.

Probably what is happening with Google is that some ODP mirrors tweak their page layout enough to get past the "automatic duplicate text" filter, and some don't. Google, for instance, displays the ODP data quite differently than the ODP does: omits some information, shuffles other information around, etc. It would take a VERY sophisticated (i.e. protoplasm-based) algorithm to detect the likeness. Others simply scrape the ODP page and redisplay it, possibly with added headers or footers. Still others may be on banned domains, or simply don't have any inbound links to speak of.

Dynamoo

7:09 am on May 27, 2003 (gmt 0)

10+ Year Member



Don't forget that PageRank slips away quickly on directory pages because of the way they're structured. Generally, you'll lose on level of PageRagk for every level you go down.

There are two main ways in which ODP data appears to be used.

The first is to get the entire RDF dump and create a clone copy of the directory, sometimes adding stuff like Amazon or other affiliate links into the content. There are hundreds of sites like this, and Google is good at spotting them. Frankly they're a waste of time unless you can drive traffic to them (maybe you're an ISP).

Secondly is to use a *portion* of the directory to add content to your site. So, if you have a site about widgets, you can add some of the directory contents about widgets to your site.

The second approach seems to work well for a number of reasons. It's very easy to do.. really, no more than copying and pasting from dmoz.org and then adding the correct attribution at the bottom. If you like you can add or delete entries or change descriptions, it's really up to you. It actually adds value to a site. Many people interested in widgets may not venture into any directory other than Yahoo!, so you're providing a service. It makes link exchaning a lot easier.. because you already have the links in place, you can just email your link partner to say that you've *already* added them which gets a more positive response. And finally, the page is more likely to be picked up as content by Google.. I've got about 20 pages from the ODP on one of my sites and it pulls about 50% of my traffic.

I mentioned attribution earlier.. that's basically the "Open Directory" box that's displayed on the bottom of every page (have a look in the Google directory and it's there too). Using ODP data without attribution won't get you a Google ban, but will get you banned from the ODP. In extreme cases it will also get all your other sites banned from the ODP too.

mil2k

11:18 am on May 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you plan to use ODP data there are some things you can take care to avoid. Do not use the same Url structure as ODP. If possible name the categories a little bit differently. You can use the urls listed but in general avoid the whole Linking structure of ODP. Those are my personal views.

In the meantime you can read this excellent thread started by msgraph, which will enlighten your knowledge about how search engines go about detecting duplicates :-Duplicates and the challenges search engines face [webmasterworld.com]

Happy reading :)