Forum Moderators: Robert Charlton & goodroi
Google not only runs their own copy of the entire Open Directory but they index their own copy in Google Search.
Neither Google nor DMOZ advise webmasters that running any DMOZ data on your site is very likely to get your entire site banned. DMOZ actively encourages sites to use DMOZ data. They even encourage webmasters to use free software for producing grossly duplicative and redundant “clones” of the entire 620,000 page Open Directory. I can understand how that could be intensely irritating to search engines.
The whole idea and promise of the “Open” Directory Project was that the data was to be freely available for use by any web site. This is effectively a fraud if only Google and their friends can use “Open” Directory data without risking being banned by Google.
The 71,816 DMOZ editors are also being victimized. They were told that they were contributing to an “Open” Directory, not acting as unpaid editors for a $100 billion dollar company.
I think Google and DMOZ both need to be considerably more “open” about this issue to develop standards for allowable use of DMOZ data, if any. If there is no acceptable use by folks that are not Google friends or partners, that should be made clear.
I was a DMOZ editor for several years, but I saw the writing on the wall... people are too lazy (or stupid) for directories, they want to type in a couple magic words and have just what they were looking for pop to the top of the results.
I fully agree with Google's intent to drop all DMOZ clones... after all it is "duplicate content".
And you're saying that's a problem? Give me a break. Since it has not really ever struck me as a good or worthwhile strategy to simply copy someone else's website contents, I really can't see this as an issue that will affect any real site. Real sites are the ones that I use every day, they present me with their own content, in various forms, even sites like slashdot have long boring comments and discussion on their listings which are themselves actual content.
Now if you creatively repackage the content, or whatever, maybe any argument can be made. Too bad for all those people who tried to get something for nothing, for some reason I'm not feeling any particular sense of pity for them, although I'm sure they will be able to rationalize to themselves whatever they want. I'm glad I'm not playing that game anymore.
I'd love to see google get even more aggressive in this area, affiliates come to mind.
Alternatively, the original poster is actually a Google search engineer executing preventative maintenance. Looks like it's succeeding:
I just noticed, in my niche, that two my competitors removed the Dmoz clone from their sites.....
While he imagines that dmoz created the rule to boost their pagerank, I'll bet his third study will confirm that it predates Google's pagerank algorithm. Karma says he got banned for refusing proper attribution.
Size estimations of Google's index vary, but it is probably somewhere between 8 billion and 25 billion pages.
Therefore removing DMOZ clones from the index whenever they catch one can free considerable amounts of Google index space. Or do you think that 1 billion of duplicated pages consisting of nothing more than links is useful content for a search engine?
treeline: In fact we have DMOZ attribution on every page that contains any DMOZ data and links to DMOZ on the site, we just don't have THREE links to DMOZ on every page. If Google calls and says "We will reinstate your site but you have to put three links to DMOZ on each page" we certainly will do it. Continuing the requirement for THREE links on EACH page and continuing to encourage deployment of 600,000 page clones certainly seems to me to be a blatent effort on the part of DMOZ to exploit search engines.
Is it really plausible that 50 percent of sites using DMOZ data and listed in DMOZ ("we...pride ourselves on being highly selective") are doing something else wrong?
I say now: I am not a Google employee. Will you say you are not a DMOZ employee or editor?
lammert: Yes, all those clones could be a significant resource drain on search engines. However, search engines generally don't index all the pages on a site, especially if it is a large site. Banning only 37 percent of the clones doesn't save much.
I think you may have diagnosed your own problem here. You're acting greedy and jealous, and having a hard time seeing how this impacts negatively on your karma.
we have DMOZ attribution on every page that contains any DMOZ data and links to DMOZ on the site, we just don't have THREE links to DMOZ on every page.
You've taken an incredible resource, the ability to have 600,000+- instant webpages that are actually about something, because you see them as having some value to your project. Now, it happens that there is a non-financial price tag connected to those. Dmoz requests that they be given credit in a certain format. This seems reasonable on the face of it, and they are up front about it, so if it is too high a price you can just pass.
Saying you paid one part of the price so are entitled to everything just doesn't fly. If I try to explain to the phone company that I paid for the long distance charges calling my mum so they shouldn't disconnect my service just because I won't pay for any of the other calls, leaves me banned from making further calls.
If I were a SE engineer looking for a quick way to eliminate questionable dmoz clones, what better place to start than those not playing by the rules? (like yours)
I say now: I am not a Google employee. Will you say you are not a DMOZ employee or editor?
Is it really plausible that 50 percent of sites using DMOZ data and listed in DMOZ ... are doing something else wrong?
I find that easy to believe.
requirement for THREE links on EACH page ... seems to me to be a blatent effort on the part of DMOZ to exploit search engines.Hey, in case you missed it the first time, they've been requiring this since BEFORE search engines hit on using link popularity for ranking purposes. They just asked for credit, and maybe a few human visitors to check out their project. This is where it really comes across that you're so caught up in your interests that you come across as acting greedy and jealous. You want to hoard links, and can't appreciate the value they're giving you, for free. Dmoz is willing to share with others who are willing to share. Those not willing to do this should follow a different path.
Just because there is no cash charge doesn't mean you are free to do anything with someone else's data. They are entitled to put restrictions on its use, including charging $$$ for it. If it is really so important to you to offer fewer links, start thinking about how much $$$ you're willing to pay for the privilege.
If Google calls and says "We will reinstate your site but you have to put three links to DMOZ on each page" we certainly will do it.
With so many clones out there, why would it be worth Google's time to call you? Again, a real attitude of entitlement on your part. Just because you create a website doesn't entitle you to search engine traffic. Build one good enough that it spreads on referral traffic and you're entitled to that traffic. If any search engine wants to ban sites using other's intellectual property inappropriately, it seems like a good idea.
I'm sorry that your site has been damaged, I know it's frustrating, but try and take a very open view to what has happened and what role your approach played in it. Perhaps, in its own way, Google has called you and left a message.
I DO think it IS a problem that DMOZ continues to entice new, unsophisticated webmasters into hosting such clones.
Altair, I was a new, unsophisticated webmaster, and it never once crossed my mind that it would be a smart, great, intelligent, cool, etc thing to do to copy an entire website. In fact, if someone had suggested that to me I would have thought that they were a complete idiot, since it is obviously just copying something.
Just because you can do something doesn't mean it's a good idea.
Copying an entire website clearly has no purpose except to avoid the work of creating an entire website. If you want to point users to a valuable resource like dmoz, a simple link to it would have done the job perfectly well. Obviously that's not the goal of people creating dmoz, or any other site, clones.
I can think of many reasons someone might think this type of activity is a good idea, desire to make money without doing the work springs immediately to mind. In some circles this is known as 'greed'.
As for the excuses you're making about why you didn't follow the dmoz rules for cloning, watching people make excuses for things that they know are wrong is getting to be an increasingly familiar sight.
But each to his own, long term success = good content, good products, original stuff of value to the end user, that's the secret, it's not rocket science. Everything else is just a trick and a shortcut, a get rich quick scheme. But it's hard to make a site that's good, so tricks will keep getting play. Personally, if you're going to do tricks, why not stop pretending and jump all the way into blackhat, at least those guys don't bother trying to rationalize their stuff, they just admit that they are trying to game the system to make some money. I'll take that approach anytime.
Is DMOZ exploiting webmasters to create vast linkfarms with billions of links to DMOZ? Or should every webmaster have known that adding any DMOZ data to their site would likely get their entire site banned as a spammer. Maybe that is why DMOZ publishes links to the clone scripts but no warnings. They don't need to warn anybody since everybody should have known. Perhaps DMOZ feels they need to provide a public service for spammers. Spammers are people too.
Not putting all those links on every page might offend DMOZ. Putting them all on will probably offend Google and other search engines. What to do? What to do?
Does DMOZ think all those exact duplicate clones are a bad thing? Or do they think they are a good thing (at least for DMOZ). Maybe that is why they list them in their "selective" directory.
Like Fox says: "We report, you decide."
Not putting all those links on every page might offend DMOZ. Putting them all on will probably offend Google and other search engines. What to do? What to do?
Probably the best bet is to build your own directory, with your own data and figure out a way to do something different so it is of value to your visitors.
I can prove that this is true. I had a authority site that was in Google for 4 years.
I was banned over a year ago and after checking through the site I decided to take down our DMOZ directory and then notified Google.
After a month, my site was back in the index and today it is a PR9 with top listing results.
So, if you have a DMOZ directory, remove it asap and ask for reclusion telling them you have done this.
I don't blame them for banning sites because of this. I guess the DMOZ directories are just killing their index bots but they should have notified site owners about this!
Is DMOZ exploiting webmasters to create vast linkfarms with billions of links to DMOZ?
No...
Or should every webmaster have known that adding any DMOZ data to their site would likely get their entire site banned as a spammer.
Well... Google defined this and "if" taking Google's stance "dup content" - ya I guess you should know.
Maybe that is why DMOZ publishes links to the clone scripts but no warnings. They don't need to warn anybody since everybody should have known. Perhaps DMOZ feels they need to provide a public service for spammers. Spammers are people too.
Well IMHO - since DMOZ predates Google, and DMOZ isn't concerned about "optimization to rank better" - I highly suspect the thought process wasn't "We should warn people that some future search engine might consider dups - bad".
Not putting all those links on every page might offend DMOZ. Putting them all on will probably offend Google and other search engines. What to do? What to do?
Do you own work maybe?
Does DMOZ think all those exact duplicate clones are a bad thing? Or do they think they are a good thing (at least for DMOZ). Maybe that is why they list them in their "selective" directory.
Well "if Google didn't exist" comes to mind.
I have a relatively "young" site that was listed in a local DMOZ directory last month. The site has only a handful (+10) backlinks acquired naturally. Yet, the link: command yields some 400 BLs in Yahoo. More than 95% of these BLs are from DMOZ clones. But none is listed in Google. And, it seems that MSN using a partial filter (100 BLs, as compared to 400).
Another niche site of mine has some +250 "real" BLs. But a Yahoo link: command yields +3000 BLs, again most of which are from DMOZ clones.
They neither drive taffic to my site, nor add anything to its popularity. And I do not think that they add any value to the user experience. They simply annoy and distract the users, and give the impression of a blatant abuse.
2)Google hadnt had to bank those websites but to lower their PR because they did nothing which is violation GNU/Opensource rules
They had to at least "weight" (maybe they do so) website overall content to Dmoz syndicated content and ban only websites which almost fully consist of Dmoz content
they did nothing which is violation GNU/Opensource rules
That is too much to assume. Even the original poster here admits he did not follow the basic rules of crediting the source, some version of which is common to most open source projects. They may be open and free for all to use, but not in any way you want. Only by following their intellectual property rules, which often specify how to credit the project and sharing additions.
Is DMOZ exploiting webmasters to create vast linkfarms with billions of links to DMOZ?
IMO – NO
I am very much with the view of treeline
Not putting all those links on every page might offend DMOZ. Putting them all on will probably offend Google and other search engines. What to do? What to do?
In both cases you are free to take your own decision and doesn’t really a part of your business strategy unless your business strategy is based on core of dmoz data.
Or do they think they are a good thing (at least for DMOZ). Maybe that is why they list them in their "selective" directory.
Probably the best bet is to build your own directory, with your own data and figure out a way to do something different so it is of value to your visitors.
Exactly!
Almost all DMOZ clones are built to generate advertising revenue. If we are honest about this they are not built to provide any useful service. If we got rid of every last one of them from the index and only DMOZ itself remained would the Internet be any poorer for it? I think not.
I would have no problem with people who are willing to do a little work to create useful content of their own on their directory sites and perhaps use some DMOZ data but there are many thousands of these directories all churning out the same stuff. The search engines are right to remove it. It's their job to provide quality results and removing DMOZ clones is an obvious way to help with this effort.
By "reasonable" use, I mean, for example, importing a niche directory into one's site as a quick way of providing some relevant links to the user. Let us say that I have a site on blue widgets with original content, and imported some relevant links from DMOZ just for the shake of convenience to site visitors. This would be a reasonable use, and would add to the user experience. I do not think that Google, or any other search engine in that matter would or should ban or punish such limited uses.
And ... I am sure Google knows this.
I think the theory about roomfulls of Googletechs manually whacking bigtime scrapers indicates more about people wearing tinfoil hats so tight their hatstraps cut off oxygen to their brain, than it does about Google. Nor do I think the new process targeted ODP clones. I think Google is just getting better at spotting "partial" duplicate content -- serial plagiarism, if you will.
If I were to put on MY tinfoil hat, I'd be thinking how Google could spot related sites that targeted collections of naturally mutually exclusive popular keywords. For instance: how many sites in the world could possibly have natural content on both "Miami Hotels" and "Las Vegas Hotels"? A couple of dozen hotel chains, two or three reservation systems, and ten million doorway spammers, that's who! Now throw in "Auto rentals", WHACK ANY SITE THAT TARGETS ALL THREE KEYWORD SETS, then sell ads to the reservation systems and the hoteliers AND NOBODY ELSE!
Presto, twenty million spam doorways gone from the web.
Wow, the air is clearer already. Wanna think about "Fruit baskets", "Toronto", and "San Francisco"? How about "mortgages", "Idaho", and "Delaware"? "real estate", "Phoenix", and "Boston"? Every site that mentions at least three trademarked fad diet plans AND includes links or form input?
That was fun. (loosening strap on tinfoil hat....)
Yes, duplicate content is a real issue. But one should question the original intention: why do we add the DMOZ niche directory? If it is for the shake of convenience, then the duplication problem is easy to evade. A simple "noindex nofollow" would do the trick. Even, we can hide the directory completely from the search engines through a robots exclusion.
selomelo, thanks for a refreshing whiff of common sense. Obviously, if you add any form of duplicate content as a service for your users, you don't need, and you shouldn't want, it to be indexed. Duplicate content has been a problem for years, why anyone is surprised when suddenly google takes a few solid whacks at some duplicate content sites is beyond me. Well, ok, it's not really beyond me, I understand that the idea of actually making a real site with real content is just too annoying for some people.
Your example was a good one though, as a SERVICE TO YOUR USERS - in other words, not as a way to put up a bunch of pages really fast to put ads on - you create a nice useful niche directory using something like a dmoz subdirectory. And you block it in robots.txt, use nofollow, since it's a service, not a way to scam some free adsense clicks.
The act of blocking solves all the problems, so at that point, anyone still wanting to whine about having to block this section of their site is clearly only using that method to create many pages for no purpose.
I'm very happy to report that one of my clients finally listened to my advice and dumped his generic directory garbage, which someone had conned him into buying and adding to his sites.
One listing for DMOZ information is enough for any SERP.
-Alex