Welcome to WebmasterWorld Guest from 54.90.204.233

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Is Banning Sites That Use Open Directory (DMOZ) Data

Is Open Directory Still Open?

     
1:55 pm on Apr 4, 2006 (gmt 0)

New User

10+ Year Member

joined:Oct 6, 2005
posts:39
votes: 0


There is a new study that says Google is “massively” banning sites that use any DMOZ data. I know many people would say “Good! Who needs another copy of some information that you can already get at the Open Directory web site”, but it seems to me that there are some fundamental issues of fairness and deception here.

Google not only runs their own copy of the entire Open Directory but they index their own copy in Google Search.

Neither Google nor DMOZ advise webmasters that running any DMOZ data on your site is very likely to get your entire site banned. DMOZ actively encourages sites to use DMOZ data. They even encourage webmasters to use free software for producing grossly duplicative and redundant “clones” of the entire 620,000 page Open Directory. I can understand how that could be intensely irritating to search engines.

The whole idea and promise of the “Open” Directory Project was that the data was to be freely available for use by any web site. This is effectively a fraud if only Google and their friends can use “Open” Directory data without risking being banned by Google.

The 71,816 DMOZ editors are also being victimized. They were told that they were contributing to an “Open” Directory, not acting as unpaid editors for a $100 billion dollar company.

I think Google and DMOZ both need to be considerably more “open” about this issue to develop standards for allowable use of DMOZ data, if any. If there is no acceptable use by folks that are not Google friends or partners, that should be made clear.

5:59 pm on Apr 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 4, 2004
posts:801
votes: 0


It would be difficult to ban sites using wikipedia, because wikipedia's content changes every once in a while doesn't it?

You sound a little nervous, no? The content doesn't change enough to count on that. I'm already seeing hints that people who were trying that are finding that google's duplicate content filters might be a little better than they thought, another thread, same idea as this one, complaining about search engines catching slightly modified wikepedia page near-clones.

I like what I'm seeing google/big daddy do, it looks like the new systems are running pretty well, by the sound of some recent threads, this one included, looks like they are starting to filter the bottom feeders out, slowly but surely.

As someone, like Broadway, who tries to create as much unique content as I can, I'm very happy to see this step, or these steps, there are now far too many sites on the web, and they are more and more just repeating each other, there is a finite pool of creativity and it's getting obvious that a lot of sites don't have any access to that pool, or don't want to exert the effort it takes to contribute, whatever.

I hope google gets even better at this part of things, it's almost enough to give me hope.

6:22 pm on Apr 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


It would be difficult to ban sites using wikipedia, because wikipedia's content changes every once in a while doesn't it?

You sound a little nervous, no?

NO, it sounds like a logical and legitimate question to me. Wiki in particular is still evolving quickly and it would be tough if Google were to filter on a page-by-page basis alone. OTOH if you look at the data across an entire site, you would probably see a very clear indicator that a site was cloned from Wikipedia or DMOZ.

Because of the large number of sites using these feeds, I suspect that Google also look for specific indicators of content from these sources in particular. Checking against multiple generations of content, looking for the copyright notice, flagging sites manually, etc. These are just for-instances, of course; I don't have any direct knowledge of their procedures.

6:53 pm on Apr 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 4, 2004
posts:801
votes: 0


jomaxx, there's nothing particularly radical or earth shaking here, what's being discussed is the use of duplicate, spidered content. The source of that content is totally irrelevant. Last I checked the use of duplicate content became problematic after the florida update, I know that's when I stopped doing that. This is very old news by now, but it appears to be news to some despite that.

I'm sure a lot of people out there are trying hard to work out ways to duplicate content creatively, it is after all very hard to create content.

You're probably spot on about some of the things google is currently working on, but the overall point doesn't change: don't use duplicate content if that's key to your site's success. I'm sure I can jump over to my favorite black hat seo forums and find lots of fun creative ways to clone many major content sources, and in fact I think I'll do just that, it's been a while.

Then I will know what to avoid on my own sites.

I liked the example of simply adding a niche directory but blocking bots from it, that's a nice one. The same idea would apply to any other duplicate content. If it's a nice resource for your users, just block access to it, noindex nofollow, robots.txt, etc.

7:06 pm on Apr 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 4, 2004
posts:801
votes: 0


as I suspected, my favorite blackhat seo forum yielded immediate results on this topic.
7:43 pm on Apr 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 8, 2004
posts:865
votes: 0


A typical page from "answers.com" with this address,

www.answers.com/topic/[city]-[state]

__Column 1___

Encyclopedia Definition - lifted from [source]

Ads

Weather - weather info based on the USGS feed

Ads

Wikipedia - dup

Ads

Link to map site - ...

Ads

Mentioned in - Just links to other "answers.com" dup pages

Ads

__Column 2___

Ads - disguised as useful local shopping guide

Ads - shopping.com ad disguised as local shopping search

Ads - "Get useful Advice" it says "Need advice on [city name]? <- umm...

Ads - Google related links, without the "ads by google"

----------------------

The real problem with sites like this is places like google reward them with not only top spots in the serps but by allowing them to make money from thier own advertising program. All this does is send a message to everybody that creating original content is not the way to go, you'll get rewarded much quicker by just scraping content from other sites and putting ads all around it. This is why there were so many DMOZ clones, hopefully google will end the trend now by removing them. Now they just need start doing the same with all websites that don't have original content. Imagine searching for something on the net and every website you click on has something different from the last.

10:40 am on Apr 12, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 24, 2005
posts:965
votes: 0


I had a DMOZ clone on one of my domains that I wasn't using. It got banned (went from PR5 to grey PR and Googlebot doesn't visit any more).

Whilst the removal doesn't bother me, I am concerned that I will not be able to use the domain for anything else.

Does anyone know how to get a domain reincluded?

11:23 am on Apr 12, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member beedeedubbleu is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 3, 2004
posts:6138
votes: 23


Now they just need start doing the same with all websites that don't have original content. Imagine searching for something on the net and every website you click on has something different from the last.

You are right on the money Twist!

Does anyone know how to get a domain reincluded?

Put a website with some useful content on it. If you build it they will come ;)

6:38 pm on Apr 14, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 28, 2005
posts:146
votes: 0


Google has always discouraged duplicate content.
So banning, if real, is expected.

But million dollar question is "Why google is duplicating dmoz content?"

--sri

8:03 pm on Apr 14, 2006 (gmt 0)

Preferred Member from US 

10+ Year Member

joined:June 6, 2005
posts:524
votes: 1


Whilst the removal doesn't bother me, I am concerned that I will not be able to use the domain for anything else.

Does anyone know how to get a domain reincluded?

I have the same exact set up for a domain that is just sitting there; however, the dmoz clone is in a directory on that domain. The domain itself is still in the Google Index, but the sub directory is not.

So I'm sure if you put other content on there the site will pop up again.

11:11 am on Apr 15, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 20, 2006
posts:91
votes: 0


there is a finite pool of creativity and it's getting obvious that a lot of sites don't have any access to that pool, or don't want to exert the effort it takes to contribute...

That's it exactly. Over the last 5 years or so there have been countless people raking it in by knocking up crap sites with unoriginal material. If we are seeing the end of this then it can't come quick enough.

Like a few people on here duplicate content filters not only don't worry me, they will actually help. It takes time, effort and skill to create a good site with original content. And I feel it's time the web catered a little more for people like me, and a little less for scraper sites and copy and paste merchants. I guess all gravy trains come to an end at some point.

As for DMOZ, it is a redundant resource, long since past its sell by date. I can't see it lasting on Google for much longer either. I'm sure they could knock up their own anyway.

This 113 message thread spans 12 pages: 113
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members