Google Is Banning Sites That Use Open Directory (DMOZ) Data - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Is Banning Sites That Use Open Directory (DMOZ) Data

Is Open Directory Still Open?

«
1
2
3
4

Altair

1:55 pm on Apr 4, 2006 (gmt 0)

10+ Year Member

There is a new study that says Google is “massively” banning sites that use any DMOZ data. I know many people would say “Good! Who needs another copy of some information that you can already get at the Open Directory web site”, but it seems to me that there are some fundamental issues of fairness and deception here.

Google not only runs their own copy of the entire Open Directory but they index their own copy in Google Search.

Neither Google nor DMOZ advise webmasters that running any DMOZ data on your site is very likely to get your entire site banned. DMOZ actively encourages sites to use DMOZ data. They even encourage webmasters to use free software for producing grossly duplicative and redundant “clones” of the entire 620,000 page Open Directory. I can understand how that could be intensely irritating to search engines.

The whole idea and promise of the “Open” Directory Project was that the data was to be freely available for use by any web site. This is effectively a fraud if only Google and their friends can use “Open” Directory data without risking being banned by Google.

The 71,816 DMOZ editors are also being victimized. They were told that they were contributing to an “Open” Directory, not acting as unpaid editors for a $100 billion dollar company.

I think Google and DMOZ both need to be considerably more “open” about this issue to develop standards for allowable use of DMOZ data, if any. If there is no acceptable use by folks that are not Google friends or partners, that should be made clear.

treeline

12:04 am on Apr 9, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The one dmoz clone that does add to knowledge on an important subject is Google's own. I for one like being able to see just how Google is ranking various related sites relative to each other. As long as Google is so important for traffic, insight into their ranking will be very interesting. It may not update as often as it did, but sites still move relative to each other. While no one search is predictable on this ranking, it still gives some idea of progress (or slippage).

Broadway

12:52 am on Apr 9, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

As a person whose sites are 100% original content and as a person who is always astounded at how much time and effort it takes to create this type of site, I salute Google in persuing this tack. I would love it if they would do the same with Wiki dupe sites to.

I thought one of the principle concepts of the web was hyperlinks. If you think there is some valuable information elsewhere on the web the concept is you link to it for you visitors. I would think that 90% of the people whose sites utilize "free and legal" information found elsewhere on the web do from the standpoint of a profit motive, not to help their site visitors. I would also bet that 90% of the people who use this content would be too lazy to use it if it couldn't just be "cut and pasted" into their html code.

BeeDeeDubbleU

9:14 am on Apr 9, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I would think that 90% of the people whose sites utilize "free and legal" information found elsewhere on the web do from the standpoint of a profit motive, not to help their site visitors.

I would think that 90% is a very conservative estimate ;)

profit motive

But then again is that not why the vast majority of us are here? ;)

Broadway

12:10 pm on Apr 9, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

That is why we are here, but why would you expect Google to reward websites just with a profit motive as opposed to websites who increase the value of the web by providing original content? Google evidently realizes that serving up SERPs where the top ten sites listed just rehash the same information isn't what people are looking for when they perform a search. Google has a profit motive too. They need to find ways to distinguish their search from other search engines.

BeeDeeDubbleU

2:43 pm on Apr 9, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

That is why we are here, but why would you expect Google to reward websites just with a profit motive as opposed to websites who increase the value of the web by providing original content?

I wouldn't.

Alex Monolith

5:34 pm on Apr 9, 2006 (gmt 0)

10+ Year Member

BeeDeeDubbleU: Profit motive is a good business idea, I think everyone agrees. If you are in business, you want profit.

If you want a highly profitable business, sell cocaine. If profit is the ONLY motivation, then selling drugs is very acceptable. The reality is that third party profit cannot be the motivation that Google uses to select sites for listing (and higher listing) in the SERPs.

Duplicate content is duplicate content, it doesn't matter if you duplicate yourself 100 times or duplicate DMOZ 100 times... it is all dupes and adds little or nothing to the surfer experience - and THAT is Google's main motivation, surfer experience.

Alex

moftary

2:38 am on Apr 10, 2006 (gmt 0)

10+ Year Member

That doesnt explain or justify why would google itself have three clones of ODP in three different URLs, all indexed by Google. Also even if you have your ODP clone as directory.yoursite.com that would cause all yoursite.com domain to be banned, while other big names has their clone with no worries.

But then again, who said life is fair?

BeeDeeDubbleU

8:22 am on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

BeeDeeDubbleU: Profit motive is a good business idea, I think everyone agrees.

Did I suggest otherwise?

SEOwebGuy

11:57 am on Apr 10, 2006 (gmt 0)

10+ Year Member

It would be difficult to ban sites using wikipedia, because wikipedia's content changes every once in a while doesn't it?

twist

2:32 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

That doesnt explain or justify why would google itself have three clones of ODP in three different URLs, all indexed by Google. Also even if you have your ODP clone as directory.yoursite.com that would cause all yoursite.com domain to be banned, while other big names has their clone with no worries.
But then again, who said life is fair?

If Yahoo and MSN were paying attention they would start banning dmoz clones, including Google.

2by4

5:59 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

It would be difficult to ban sites using wikipedia, because wikipedia's content changes every once in a while doesn't it?

You sound a little nervous, no? The content doesn't change enough to count on that. I'm already seeing hints that people who were trying that are finding that google's duplicate content filters might be a little better than they thought, another thread, same idea as this one, complaining about search engines catching slightly modified wikepedia page near-clones.

I like what I'm seeing google/big daddy do, it looks like the new systems are running pretty well, by the sound of some recent threads, this one included, looks like they are starting to filter the bottom feeders out, slowly but surely.

As someone, like Broadway, who tries to create as much unique content as I can, I'm very happy to see this step, or these steps, there are now far too many sites on the web, and they are more and more just repeating each other, there is a finite pool of creativity and it's getting obvious that a lot of sites don't have any access to that pool, or don't want to exert the effort it takes to contribute, whatever.

I hope google gets even better at this part of things, it's almost enough to give me hope.

jomaxx

6:22 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

It would be difficult to ban sites using wikipedia, because wikipedia's content changes every once in a while doesn't it?

You sound a little nervous, no?

NO, it sounds like a logical and legitimate question to me. Wiki in particular is still evolving quickly and it would be tough if Google were to filter on a page-by-page basis alone. OTOH if you look at the data across an entire site, you would probably see a very clear indicator that a site was cloned from Wikipedia or DMOZ.

Because of the large number of sites using these feeds, I suspect that Google also look for specific indicators of content from these sources in particular. Checking against multiple generations of content, looking for the copyright notice, flagging sites manually, etc. These are just for-instances, of course; I don't have any direct knowledge of their procedures.

2by4

6:53 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

jomaxx, there's nothing particularly radical or earth shaking here, what's being discussed is the use of duplicate, spidered content. The source of that content is totally irrelevant. Last I checked the use of duplicate content became problematic after the florida update, I know that's when I stopped doing that. This is very old news by now, but it appears to be news to some despite that.

I'm sure a lot of people out there are trying hard to work out ways to duplicate content creatively, it is after all very hard to create content.

You're probably spot on about some of the things google is currently working on, but the overall point doesn't change: don't use duplicate content if that's key to your site's success. I'm sure I can jump over to my favorite black hat seo forums and find lots of fun creative ways to clone many major content sources, and in fact I think I'll do just that, it's been a while.

Then I will know what to avoid on my own sites.

I liked the example of simply adding a niche directory but blocking bots from it, that's a nice one. The same idea would apply to any other duplicate content. If it's a nice resource for your users, just block access to it, noindex nofollow, robots.txt, etc.

2by4

7:06 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

as I suspected, my favorite blackhat seo forum yielded immediate results on this topic.

twist

7:43 pm on Apr 10, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

A typical page from "answers.com" with this address,

www.answers.com/topic/[city]-[state]

__Column 1___

Encyclopedia Definition - lifted from [source]

Ads

Weather - weather info based on the USGS feed

Ads

Wikipedia - dup

Ads

Link to map site - ...

Ads

Mentioned in - Just links to other "answers.com" dup pages

Ads

__Column 2___

Ads - disguised as useful local shopping guide

Ads - shopping.com ad disguised as local shopping search

Ads - "Get useful Advice" it says "Need advice on [city name]? <- umm...

Ads - Google related links, without the "ads by google"

----------------------

The real problem with sites like this is places like google reward them with not only top spots in the serps but by allowing them to make money from thier own advertising program. All this does is send a message to everybody that creating original content is not the way to go, you'll get rewarded much quicker by just scraping content from other sites and putting ads all around it. This is why there were so many DMOZ clones, hopefully google will end the trend now by removing them. Now they just need start doing the same with all websites that don't have original content. Imagine searching for something on the net and every website you click on has something different from the last.

mrMister

10:40 am on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I had a DMOZ clone on one of my domains that I wasn't using. It got banned (went from PR5 to grey PR and Googlebot doesn't visit any more).

Whilst the removal doesn't bother me, I am concerned that I will not be able to use the domain for anything else.

Does anyone know how to get a domain reincluded?

BeeDeeDubbleU

11:23 am on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Now they just need start doing the same with all websites that don't have original content. Imagine searching for something on the net and every website you click on has something different from the last.

You are right on the money Twist!

Does anyone know how to get a domain reincluded?

Put a website with some useful content on it. If you build it they will come ;)

conjo_guam

6:38 pm on Apr 14, 2006 (gmt 0)

10+ Year Member

Google has always discouraged duplicate content.
So banning, if real, is expected.

But million dollar question is "Why google is duplicating dmoz content?"

--sri

Kufu

8:03 pm on Apr 14, 2006 (gmt 0)

10+ Year Member

Whilst the removal doesn't bother me, I am concerned that I will not be able to use the domain for anything else.
Does anyone know how to get a domain reincluded?

I have the same exact set up for a domain that is just sitting there; however, the dmoz clone is in a directory on that domain. The domain itself is still in the Google Index, but the sub directory is not.

So I'm sure if you put other content on there the site will pop up again.

ZoltanTheBold

11:11 am on Apr 15, 2006 (gmt 0)

10+ Year Member

there is a finite pool of creativity and it's getting obvious that a lot of sites don't have any access to that pool, or don't want to exert the effort it takes to contribute...

That's it exactly. Over the last 5 years or so there have been countless people raking it in by knocking up crap sites with unoriginal material. If we are seeing the end of this then it can't come quick enough.

Like a few people on here duplicate content filters not only don't worry me, they will actually help. It takes time, effort and skill to create a good site with original content. And I feel it's time the web catered a little more for people like me, and a little less for scraper sites and copy and paste merchants. I guess all gravy trains come to an end at some point.

As for DMOZ, it is a redundant resource, long since past its sell by date. I can't see it lasting on Google for much longer either. I'm sure they could knock up their own anyway.

Phil_Payne

12:17 pm on Apr 15, 2006 (gmt 0)

10+ Year Member

> As for DMOZ, it is a redundant resource, long since past its sell by date. I can't see it lasting on Google for much longer either. I'm sure they could knock up their own anyway.

It's having a vers serious effect on the currency of results. DMOZ doesn't review its featured sites anything like often enough.

I have a set of keywords that bring up a certain site at #1 - despite the site being low content and virtually abandoned for three years. More modern and much more vital sites (multiple updates per week, a forum with three dozen messages a day) don't show in the top thirty.

It's making Google a laughing stock. With Google's supposed insistence on progammatic solutions to indexing and ranking, using a manual resource that is DEFINITELY corrupt in some areas seems an anachronism.

RichTC

12:32 pm on Apr 15, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Its just fantastic news that Google is now sorting all this duplicate junk off the internet. I say well done Google, it was long overdue!.

DMOZ is no longer a quality resource due to the corruption problems and DMOZ should not have let webmasters clone its data for as long as it has. The penny has dropped at long last at Google that its results can be manipulated due to poor sites securing DMOZ listings by corrupt editors and then having thousands of false backlinks as a result from all these clone sites.

Also, i cant see any need for any site to carry clone directory data. I have two sites that i work on that have directory type resource facilities on them for their niche as one section of the site, but its all original content. In both cases they carry a lot of quality resource details that are not even listed in DMOZ, the information is more detailed and its updated, correct and both sites have had no problems with this Google adjustment to its algo.

All google is doing is reducing duplicate pages off the net. Why do we need more that one directory on the internet with the exact same content?

Why do any of you that copy this junk off DMOZ think that to clone webpages onto the internet is a good idea? Also, you are copying pages that either contain outdated links, broken links, dead sites etc, etc. Its like copying someones homework that has done it wrong!

You must all see that its duplicating and filling the internet with the same stuff.

If you are a webmaster that has been hit with this all i can say is you will need to go back to the drawing board and put the spade work in to create original content like the rest of us. Dont go for short cuts by copying someone elses work, do it yourself, also you will find it far more rewarding in the long term.

activeco

12:45 pm on Apr 15, 2006 (gmt 0)

10+ Year Member

With Google's supposed insistence on progammatic solutions to indexing and ranking, using a manual resource that is DEFINITELY corrupt in some areas seems an anachronism.

Right on the money.

While I understand and support their preference of open source organizations/societies, it is unbelievable Google still puts a significant value to DMOZ in their search results today.
Aside from $$/interests/corruption issues, directories are de facto dead.

They were nice and noble in times before search engines catched up, but who is using them today?
Everybody's cheering when included in DMOZ for the reasons well known, the bottom line should be how usefull the directory would be without Google and 2,000 clones/links.

I don't know of anyone getting significant traffic from DMOZ or even Yahoo directory alone.

This 113 message thread spans 4 pages: 113

«
1
2
3
4