Forum Moderators: open

Message Too Old, No Replies

Duplicate Content Scenario

trying to avoid a penalty

         

paulk

2:52 am on May 17, 2003 (gmt 0)

10+ Year Member



Hey guys, was wondering if anyone can give their thoughts on this scenario I have. I have a website that has approx. 100 internal article links, every single page unique title, and so on. The site has been recently picked up by freshbot but has not yet been deepcrawled. I am also using a background image on every single article page. Now the question is this... half of my articles come from domain.com research website where only the domain is picked up by google but the articles are not. About 25% of the articles are taken from another source that is listed in google. Question is, is having every article: unique title, background image, unique header, unique URL name good enough to not get me penalized? I'm confused how the spider seeks duplicate content even after searching for topics related on WW. Other then that..my entire site is unique and on the actual domain, i do use text from other sources but it is not a "full site duplicate" at all. Just snippets of text from several sources, and some extra editing of my own words in between. Anyone can give me any advice on this? Much appreciated..

Paul

oraqref

3:10 am on May 17, 2003 (gmt 0)



And how on earth is Google going to determine who is the originator of the content? Or are they going to ban the original work and let one of the dupers stay in?

They can't determine that with an algorithm so I think this duplicate content story is bogus. Either that, or they'll get a lot of people really pissed. And I can't see them win a lawsuit either against someone whose site got banned for content he was the originator of.

Oraqref

MHes

3:40 am on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



paulk - Welcome to the forums.

You are right to be concerned because duplication must be an issue Google is looking into. General rule from my experience is if you have 15% original content per page you are ok, but this may rise. The current limit is probably you must have at least 10%.

BUT, IMHO the following may be happenning:

1) Google is detecting templates, so your navigation template etc. will not be included in the duplicate content equation. So if 20% of your pages are template, this will be detected and ignored, the remaining 80% must have a reasonable chunk of original content, I would guess at least 20% to be safe.

2) ( oraqref) Google will, rightly or wrongly, make a guess on who had the content first. Lots of speculation as to how, but probably pr plays a part and perhaps the age of the pages. I have wondered as well how they do this, duplicate content within a site is easy to detect, but across domains introduces lots of problems for them. However, many people seem to have had pages dropped for duplicating other sites content. There are loads of affiliate sites churning out content pinched from other sites, so google must be addressing this problem big time.

oraqref

3:52 am on May 17, 2003 (gmt 0)




I understand that they should adress this problem, but that doesn't change the fact that they can't determine who originally wrote that material. Should the site of a famous poet get penalized because his poems appear everywhere on the web, even before he made his own site? Don't you think that'd be really weird?

Oraqref

skipfactor

4:16 am on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Their filters must certainly have a handle on this point. Otherwise, a lot of the news sites[Google :)] would be penalized for BBC, AP, etc. feeds.

With as many times as GG has mentioned a "SARS"-type freshness approach lately, I would make your site emulate a news site, sounds like it already is anyway. Credit your sources, keep it fresh and different, and I think you're fine.

MHes

4:17 am on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi

Yes you are right, and I can't see how they get round that one as well. I would love to agree that 'duplicate content penalties' are a myth, but it seems people have suffered from this. I can see within a site it is fine for google to ignore or drop duplicate pages, but I agree that once they start comparing two different sites content they could get it badly wrong!

However, the poet could always sue for copywrite, then get his poems on the web. Google is not responsible for listed sites content, and will choose on a 'first come' basis and/or the pr etc. If they worried about the legal validity of every page they index they would not list anybody! So I suppose they take a view that it is up to the agrieved party to pursue stolen content, it is not their problem. They just cannot list hundreds of identical pages, so they have to pick one and let any legal issues be sorted out elsewhere.

skipfactor

4:26 am on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



it is up to the agrieved party to pursue stolen content

I agree but it looks like they won't look the other way if you're proactive:

[google.com...]

paulk

4:42 am on May 17, 2003 (gmt 0)

10+ Year Member



interesting points... but what if this is the case: A very popular indexed resource domain offering thousands of articles for others to get knowledge from. However when I took sections of part of the article's text and put it in quotes to search in google, looks like those articles aren't listed. Question is, when google goes out to check for duplicate content and compares my site (site1) to the other site B, would it compare to Site B's non indexed pages? Site B domain is similar to a search engine for resources that i used to copy and paste the articles pretty much, and added unique headers & unique titles on each one of them.

BigDave

5:19 am on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From what I understand, there are two different duplicate content actions that google can take.

Substantially duplicate pages - Google will filter out the duplicate pagesto keep the SERPS clean. It will not affect the rest of your site, they will only remove the duplicate page. This is not a penalty.

Substantially duplicate sites - These sites will simply be removed as spam. Tuis is a penalty if they thing the site is trying to spam and they might even remove *all* copies of the site, if they are obviously all related.

It is not possible for google to compare every page to every other page. They will compare pages that are most likely to be the same. I don't know how they come up with their criteria for what sites to compare.

paulk

5:22 am on May 17, 2003 (gmt 0)

10+ Year Member



So basically.......if you have 100 articles, and google considers 20 of them duplicates by their algorithm, worst penalty i could get is loss of backlinks pretty much?

mil2k

5:44 am on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello paulk & oraqref Welcome to WebmasterWorld [webmasterworld.com].

They can't determine that with an algorithm so I think this duplicate content story is bogus.

I suggest you read the forums including some old threads. Do a site search for duplicate content and you will get some good info to digest.

I should confirm this :-
Google algos pick up pure duplicate content.

I have seen many sites where duplicate content has been penalized.

Example one:- A News website. It promoted it's website by a different name. After 2 years they had some problems, so they chose their original brand name and copied the whole site on the new domain. All the duplicate pages got Grey Bar Penalty. But this was a Big brand site, and hence, they gradually got good one way incoming links. So a site wide Ban was not applied and the new site has some pages which have PR. But the home page and many other important pages still have Grey Bar. They are not aware of this problem.
Ignorance is a bliss....

Example two:- A corporate website. Due to Branding reasons they bought a good domain name. Duplicated the old site on the new one. Whole New site has a Grey Bar penalty. They don't care of ranking of this site. So again this site will remain Grey bar.

Now these were the cases of 100% duplicate content. And very easily picked by google , in the first case by a page by page basis.

If it is not a 100% duplication (the templates are different or some tweaking) then a Probability scenario occurs where you May or May not be penalized. Just shared my experiences. Make your own decisions. HTH.:)

oraqref

9:38 am on May 17, 2003 (gmt 0)



"However, the poet could always sue for copywrite"

Except if he doesn't have a problem with the fact that people tend to like his poems so much they put them on their websites. That's one of the liberties the poet has. And believe me, there are plenty of people who feel that way, I as a semi-professional poet am one of them. It has happened to me that poems of mine appear on other websites without permission but I just tend to feel flattered about that.

And what about, say, 18th century translations of the Illias? Public domain. Let's say there are 5 different sites who all have another translation with slight varieties. Penalized. Result: people who look for these must wade through tons of garbage first because Google wanted to play editor instead of search engine.

Oraqref

oraqref

9:52 am on May 17, 2003 (gmt 0)



Hi Mil2k,

Thanks for the welcome. I wasn't arguing that Google cannot spot duplicate content. I was arguing that they can't determine who created such content. A duplicate content penalty creates big problems, some of which the rather technical people at Google have probably not even considered. The news sites are good examples: it will be really hard to find good news sites if they all get penalized for using dupicate content. There are many more such examples. What about a collection of public domain poetry? Penalized, because the individual poems all appear elsewhere. And so on. I don't really feel that's the direction Google should go, even if there are people daft enough to put loads of duplicate sites on the web to catch visitors. The fact that an algorythm can't determine such nuanced cases spells disaster for loads of people.

Oraqref

percentages

9:55 am on May 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The dupe content filter is either very lame or non-existent.....look at the PHP cat and tell me there isn't significant dupe content that still ranks very high!

If Google has a dupe content filter at all it certainly only looks at "very" similar pages. "Very" in my estimation means 98%+.

seobuddy

10:56 am on May 17, 2003 (gmt 0)

10+ Year Member



Hey guys

Duplicate Content plays a vital role in sites. suppose ur using the content from some other site and that site has a copyright then they surely sue you. About getting banned in google it might happen that the site from where u have picked up the content and that particular site is already listed in google from many days then it might have a problem ur site to get it listed. So try to avoid Dulicate Content.