Forum Moderators: Robert Charlton & goodroi
Google not only runs their own copy of the entire Open Directory but they index their own copy in Google Search.
Neither Google nor DMOZ advise webmasters that running any DMOZ data on your site is very likely to get your entire site banned. DMOZ actively encourages sites to use DMOZ data. They even encourage webmasters to use free software for producing grossly duplicative and redundant “clones” of the entire 620,000 page Open Directory. I can understand how that could be intensely irritating to search engines.
The whole idea and promise of the “Open” Directory Project was that the data was to be freely available for use by any web site. This is effectively a fraud if only Google and their friends can use “Open” Directory data without risking being banned by Google.
The 71,816 DMOZ editors are also being victimized. They were told that they were contributing to an “Open” Directory, not acting as unpaid editors for a $100 billion dollar company.
I think Google and DMOZ both need to be considerably more “open” about this issue to develop standards for allowable use of DMOZ data, if any. If there is no acceptable use by folks that are not Google friends or partners, that should be made clear.
I thought one of the principle concepts of the web was hyperlinks. If you think there is some valuable information elsewhere on the web the concept is you link to it for you visitors. I would think that 90% of the people whose sites utilize "free and legal" information found elsewhere on the web do from the standpoint of a profit motive, not to help their site visitors. I would also bet that 90% of the people who use this content would be too lazy to use it if it couldn't just be "cut and pasted" into their html code.
I would think that 90% of the people whose sites utilize "free and legal" information found elsewhere on the web do from the standpoint of a profit motive, not to help their site visitors.
I would think that 90% is a very conservative estimate ;)
profit motive
But then again is that not why the vast majority of us are here? ;)
If you want a highly profitable business, sell cocaine. If profit is the ONLY motivation, then selling drugs is very acceptable. The reality is that third party profit cannot be the motivation that Google uses to select sites for listing (and higher listing) in the SERPs.
Duplicate content is duplicate content, it doesn't matter if you duplicate yourself 100 times or duplicate DMOZ 100 times... it is all dupes and adds little or nothing to the surfer experience - and THAT is Google's main motivation, surfer experience.
Alex
But then again, who said life is fair?
That doesnt explain or justify why would google itself have three clones of ODP in three different URLs, all indexed by Google. Also even if you have your ODP clone as directory.yoursite.com that would cause all yoursite.com domain to be banned, while other big names has their clone with no worries.But then again, who said life is fair?
If Yahoo and MSN were paying attention they would start banning dmoz clones, including Google.
It would be difficult to ban sites using wikipedia, because wikipedia's content changes every once in a while doesn't it?
You sound a little nervous, no? The content doesn't change enough to count on that. I'm already seeing hints that people who were trying that are finding that google's duplicate content filters might be a little better than they thought, another thread, same idea as this one, complaining about search engines catching slightly modified wikepedia page near-clones.
I like what I'm seeing google/big daddy do, it looks like the new systems are running pretty well, by the sound of some recent threads, this one included, looks like they are starting to filter the bottom feeders out, slowly but surely.
As someone, like Broadway, who tries to create as much unique content as I can, I'm very happy to see this step, or these steps, there are now far too many sites on the web, and they are more and more just repeating each other, there is a finite pool of creativity and it's getting obvious that a lot of sites don't have any access to that pool, or don't want to exert the effort it takes to contribute, whatever.
I hope google gets even better at this part of things, it's almost enough to give me hope.
It would be difficult to ban sites using wikipedia, because wikipedia's content changes every once in a while doesn't it?
You sound a little nervous, no?
Because of the large number of sites using these feeds, I suspect that Google also look for specific indicators of content from these sources in particular. Checking against multiple generations of content, looking for the copyright notice, flagging sites manually, etc. These are just for-instances, of course; I don't have any direct knowledge of their procedures.
I'm sure a lot of people out there are trying hard to work out ways to duplicate content creatively, it is after all very hard to create content.
You're probably spot on about some of the things google is currently working on, but the overall point doesn't change: don't use duplicate content if that's key to your site's success. I'm sure I can jump over to my favorite black hat seo forums and find lots of fun creative ways to clone many major content sources, and in fact I think I'll do just that, it's been a while.
Then I will know what to avoid on my own sites.
I liked the example of simply adding a niche directory but blocking bots from it, that's a nice one. The same idea would apply to any other duplicate content. If it's a nice resource for your users, just block access to it, noindex nofollow, robots.txt, etc.
www.answers.com/topic/[city]-[state]
__Column 1___
Encyclopedia Definition - lifted from [source]
Ads
Weather - weather info based on the USGS feed
Ads
Wikipedia - dup
Ads
Link to map site - ...
Ads
Mentioned in - Just links to other "answers.com" dup pages
Ads
__Column 2___
Ads - disguised as useful local shopping guide
Ads - shopping.com ad disguised as local shopping search
Ads - "Get useful Advice" it says "Need advice on [city name]? <- umm...
Ads - Google related links, without the "ads by google"
----------------------
The real problem with sites like this is places like google reward them with not only top spots in the serps but by allowing them to make money from thier own advertising program. All this does is send a message to everybody that creating original content is not the way to go, you'll get rewarded much quicker by just scraping content from other sites and putting ads all around it. This is why there were so many DMOZ clones, hopefully google will end the trend now by removing them. Now they just need start doing the same with all websites that don't have original content. Imagine searching for something on the net and every website you click on has something different from the last.
Whilst the removal doesn't bother me, I am concerned that I will not be able to use the domain for anything else.
Does anyone know how to get a domain reincluded?
Now they just need start doing the same with all websites that don't have original content. Imagine searching for something on the net and every website you click on has something different from the last.
You are right on the money Twist!
Does anyone know how to get a domain reincluded?
Put a website with some useful content on it. If you build it they will come ;)
Whilst the removal doesn't bother me, I am concerned that I will not be able to use the domain for anything else.Does anyone know how to get a domain reincluded?
I have the same exact set up for a domain that is just sitting there; however, the dmoz clone is in a directory on that domain. The domain itself is still in the Google Index, but the sub directory is not.
So I'm sure if you put other content on there the site will pop up again.
there is a finite pool of creativity and it's getting obvious that a lot of sites don't have any access to that pool, or don't want to exert the effort it takes to contribute...
That's it exactly. Over the last 5 years or so there have been countless people raking it in by knocking up crap sites with unoriginal material. If we are seeing the end of this then it can't come quick enough.
Like a few people on here duplicate content filters not only don't worry me, they will actually help. It takes time, effort and skill to create a good site with original content. And I feel it's time the web catered a little more for people like me, and a little less for scraper sites and copy and paste merchants. I guess all gravy trains come to an end at some point.
As for DMOZ, it is a redundant resource, long since past its sell by date. I can't see it lasting on Google for much longer either. I'm sure they could knock up their own anyway.
It's having a vers serious effect on the currency of results. DMOZ doesn't review its featured sites anything like often enough.
I have a set of keywords that bring up a certain site at #1 - despite the site being low content and virtually abandoned for three years. More modern and much more vital sites (multiple updates per week, a forum with three dozen messages a day) don't show in the top thirty.
It's making Google a laughing stock. With Google's supposed insistence on progammatic solutions to indexing and ranking, using a manual resource that is DEFINITELY corrupt in some areas seems an anachronism.
DMOZ is no longer a quality resource due to the corruption problems and DMOZ should not have let webmasters clone its data for as long as it has. The penny has dropped at long last at Google that its results can be manipulated due to poor sites securing DMOZ listings by corrupt editors and then having thousands of false backlinks as a result from all these clone sites.
Also, i cant see any need for any site to carry clone directory data. I have two sites that i work on that have directory type resource facilities on them for their niche as one section of the site, but its all original content. In both cases they carry a lot of quality resource details that are not even listed in DMOZ, the information is more detailed and its updated, correct and both sites have had no problems with this Google adjustment to its algo.
All google is doing is reducing duplicate pages off the net. Why do we need more that one directory on the internet with the exact same content?
Why do any of you that copy this junk off DMOZ think that to clone webpages onto the internet is a good idea? Also, you are copying pages that either contain outdated links, broken links, dead sites etc, etc. Its like copying someones homework that has done it wrong!
You must all see that its duplicating and filling the internet with the same stuff.
If you are a webmaster that has been hit with this all i can say is you will need to go back to the drawing board and put the spade work in to create original content like the rest of us. Dont go for short cuts by copying someone elses work, do it yourself, also you will find it far more rewarding in the long term.
With Google's supposed insistence on progammatic solutions to indexing and ranking, using a manual resource that is DEFINITELY corrupt in some areas seems an anachronism.
Right on the money.
While I understand and support their preference of open source organizations/societies, it is unbelievable Google still puts a significant value to DMOZ in their search results today.
Aside from $$/interests/corruption issues, directories are de facto dead.
They were nice and noble in times before search engines catched up, but who is using them today?
Everybody's cheering when included in DMOZ for the reasons well known, the bottom line should be how usefull the directory would be without Google and 2,000 clones/links.
I don't know of anyone getting significant traffic from DMOZ or even Yahoo directory alone.