homepage Welcome to WebmasterWorld Guest from 54.237.38.30
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
What is dupe content and how is it measured?
Look for a quantitative approach.
mark1615

10+ Year Member



 
Msg#: 27138 posted 5:07 pm on Dec 17, 2004 (gmt 0)

We made a mistake a year ago and put up 3 copies of the same page *ouch* because we saw a competitor doing it successfully and thought we would give it a shot. *ouch* *ouch* Suffice it to say this brought home the reality of the dupe content filter. The first page ranked pretty well and then the other two got indexed and the, well, um... you know. But here is the real question (in parts):

1) What is dupe content? In other words, is it the same words in the order? What if, for example, you have 5 paragraphs and rearrange them?
2) Is dupe content a function of text only or of the code on the page?
3) What is the percentage of identical content that is acceptable?

 

experienced

10+ Year Member



 
Msg#: 27138 posted 7:28 am on Dec 18, 2004 (gmt 0)

Hi,

As per my knowledge, your answers are as below -

1. A duplicate content is called a page as same as more than 50% similar with the other page running over the web. Rearrange little bit workd but with the 50% different fresh content.

2. Text in the same & duplicate alignment in the page is not acceptable as well code. So when you change your 50% content your code also have to be changed like css, color scheme, images names and litle bit page name also or visa verca.
3. 45% to 50% works.

HTH

Rgds
Sachin

victor

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27138 posted 9:46 am on Dec 18, 2004 (gmt 0)

Checkout the patent that google has on detecting duplicate content.

A good starting point is:
[cs.umd.edu...]

rich42

10+ Year Member



 
Msg#: 27138 posted 10:14 am on Dec 18, 2004 (gmt 0)

I hate to give a 'non-answer' answer - but I think if you're too worried about duplicate content - you're SEO efforts might be mis-focused.

A little duplicate content on your site probably won't hurt you - but it probably won't benefit you either.

I've seen a lot of near-duplicate pages show up in the top of the serps for various keywords - and then a few weeks later they're gone.

prairie

10+ Year Member



 
Msg#: 27138 posted 11:59 am on Dec 18, 2004 (gmt 0)

Its really something to watch, I'd be inclined to stay away from duplicate content altogether by mixing up/breaking up text and code/layout, and also ensuring you don't share "too much" duplicate content with another host.

Its also worth considering site/linking structure/file names, because these could be targets of duplicate content filtering as well.

Stay unique and you should be fine.

Rajeev

10+ Year Member



 
Msg#: 27138 posted 12:26 pm on Dec 18, 2004 (gmt 0)

hello. I have a website which is more of an alffiliate webpage <snip> and its facing similar problems... I dont know why but 50% or more should say is contents which i had made up which is not repetitive... but my rankings now are down. I have problems with page ranking in one of my other site. <snip> . Anyone with advice would be really appreciated.

Rajeev Sahadevan

[edited by: lawman at 1:21 pm (utc) on Dec. 18, 2004]
[edit reason] No Links To Your Site Allowed [/edit]

OptiRex



 
Msg#: 27138 posted 1:52 pm on Dec 18, 2004 (gmt 0)

This is an area where our sites have extremely similar pages except for title bar, tags, a little text and one image change.

Quite simply, using a duplicate content checker, the 2,000+ pages we have are all 99% similar. I have just checked one and the two pages are 99.839138079383% percentage similar.

The urls read thus:

domain/country1/product1/widget1.html
domain/country1/product1/widget2.html

For that country alone we have 60 widgets and all the pages are this similar however Google correctly recognises that they are all very different and nearly all 2,000+ pages rank at #1 or at least in the top 3 and have done so for several years.

The same template was used to construct every product and every widget page.

I have no idea how they do it other than to say I am very pleased that Google can since I can assure you that in MSN and Yahoo! we have a lot of unintended listings since they do not seem to be able to recognise the authority pages versus some of the duplicates we have on test-bed urls.

mark1615

10+ Year Member



 
Msg#: 27138 posted 3:53 pm on Dec 18, 2004 (gmt 0)

I statrted this thread for a few reasons, some of which have been addressed:

1) Will using a template contribute to a dupe content problem? My take is no as they seem to the norm.
2) How much text is considered duplicate? The reason being, on one site that sells widgets we generally include info on the mfg on every specific widget page along with the info on the specific widget. We really can't come up with 2,000 different descriptions of the mfg just to avoid the dupe content filter but if we know where the line is, we can address that perhaps by making the individual product descriptions longer...
3) What is dupe content? (Maybe this should be #1) A specific defintion here is what we would love. In other words, is is the same words on the page, the same words in the same order, the same sentences in the same order, etc., etc.

After our experience of a year ago with real dupe content we don't want to do it inadvertently.

OptiRex



 
Msg#: 27138 posted 4:36 pm on Dec 18, 2004 (gmt 0)

These are my experiences only, others may have different:

>1) Will using a template contribute to a dupe content problem? My take is no as they seem to the norm.

No penalty since most sites use their own templates to keep navigation consistent for the user.

>2) How much text is considered duplicate?

As I've already said, our sites are 99%+ identical apart from the url and specific widget name description. Others may disgree however as you say it's just not feasible to write 2,000 descriptions about a product which only varies in colour/quality/origin.

>3) What is dupe content?

In my experience exactly the same page(s) served under a different url(s) e.g.

domain1/country1/product1/widget1.html

domain2/country1/product1/widget1.html

I have never tried a duplicate page on the same url since I have never felt the necessity but I shall try it on a test-bed to see what happens!

>3 again)>is is the same words on the page, the same words in the same order, the same sentences in the same order,

Never had a problem with that e.g.

description description widget1 description
description description widget2 description
description description widget3 description

Anyone else?

annej

WebmasterWorld Senior Member annej us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27138 posted 5:31 pm on Dec 18, 2004 (gmt 0)

A little duplicate content on your site probably won't hurt you - but it probably won't benefit you either.

I've given permission to other sites to reprint a few of my articles over the years. They never seem to show up in the serps. Is this because Google can tell my content is the original? Could sharing articles like this be hurting me in terms or PR or serps?

Back in the old days we could share content like this just to spread the information around. A few articles have been translated into other languages as well. I wouldn't think that would count against me though.

rich42

10+ Year Member



 
Msg#: 27138 posted 8:29 pm on Dec 18, 2004 (gmt 0)

I've given permission to other sites to reprint a few of my articles over the years. They never seem to show up in the serps. Is this because Google can tell my content is the original? Could sharing articles like this be hurting me in terms or PR or serps?

I've published content from other sites on my site and had it outrank the original a number of times. It mainly seems to come down to PR.

awebguy

10+ Year Member



 
Msg#: 27138 posted 10:29 am on Dec 19, 2004 (gmt 0)

"Checkout the patent that google has on detecting duplicate content.
A good starting point is:
[cs.umd.edu...] "

This reference is totally useless for web spammers and SEO.
It is almost impossible to understand methods described in the patent without appropriate scientific background.
There is no information about whether or how Google is currently applying the techniques described in the patent.

mark1615

10+ Year Member



 
Msg#: 27138 posted 7:58 pm on Dec 19, 2004 (gmt 0)

I thought the patent was useful. Of course we don't know how they are applying it right now but it gives a good look at the technology being used or at the very least how they think about it. And yes, someone without the appropriate background won't get past the first 2 paragraphs.

GodLikeLotus

10+ Year Member



 
Msg#: 27138 posted 8:16 pm on Dec 19, 2004 (gmt 0)

In my experience Google cannot tell which is the original copy.

I have had a site completely copied and now we are knowhere to been seen in the Google SERPS, the STOLEN COPY now enjoys the ranking we used to have.

My site went live 5 months before the copied versions domain was even purchased.

How on earth do you get around 80% duplication on 90% of your pages?

annej

WebmasterWorld Senior Member annej us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27138 posted 12:23 am on Dec 20, 2004 (gmt 0)

I have had a site completely copied and now we are knowhere to been seen in the Google SERPS, the STOLEN COPY now enjoys the ranking we used to have.

How depressing!

Do you have any idea why they got the rankings? Do they have higher PR or more backlinks?

I've had problems with EBay sellers copying a few articles. Fortunately the sale doesn't last that long and the page doesn't have much rank as I've never seen them in serps.

GodLikeLotus

10+ Year Member



 
Msg#: 27138 posted 1:10 am on Dec 20, 2004 (gmt 0)

annej>

How it all happened.

1. Launched the site on May 4th 2003

2. Acquired 1 (repeat 1) decent PR5 link

3. Site began to get good traffic from Google in late August 2003.

4. December 2003, found an exact copy of our directory when searching on Google for subject related sites. Done by a customer.

5. Confronted the guy on the phone. His reply. "I will be number 1 in this industry, Whatever it takes, but I still want the traffic you can supply"

6. Late December 2003 - The 1 link in from the PR5 subject related site had a major problem with their host and subsequently lost all their pages in Google. The site was quickly moved to another hosting company and returned very quickly into the SERPS, back to where it was before the problems by Feb 2004.

6. Jan 2004 - Began to see the STOLEN Duplicate site appearing in the Google SERPS all around our site.

7. FELL OUT BIG TIME with the customer.

8. March 2004 - Our site took a massive drop in the SERPS

9. Have spent the last 8 months working on improving the site and gaining new links from other subject related sites but up to now, nothing.

10. December 19th 2004 - Today our site sits in depths of the Google SERPS, don't even rate top for our own domain name.

11. "All I want For Christmas Is My Rankings In The Google SERPS To Be Restored To Where They Should Be".

------------------------------------------------------

GodLikeLotus

10+ Year Member



 
Msg#: 27138 posted 3:30 pm on Jan 2, 2005 (gmt 0)

Does Google have anyway of determining which site came first?

prairie

10+ Year Member



 
Msg#: 27138 posted 3:57 pm on Jan 2, 2005 (gmt 0)

Does Google have anyway of determining which site came first?

I think the best they can do here is to re-crawl sites very frequently, and I presume that they crawl a lot more than they used to. They certainly crawl my "sandboxed" site ad nauseum, sometimes I think Googlebot is its only traffic (!).

To protect your copy you need to have a high enough PR to warrant frequent re-vists by the Googlebot, and link new content from your home page if it is buried deeper down to ensure you get it indexed ASAP.

GodLikeLotus

10+ Year Member



 
Msg#: 27138 posted 4:17 pm on Jan 2, 2005 (gmt 0)

prairie- so are you saying that Google has NO WAY of determing which site is the original?

I have been working for the last 7 months on improving PR and adding more unique topic related content to the site.

Nothing seems to works.

Iguana

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27138 posted 11:43 pm on Jan 2, 2005 (gmt 0)

Try searching for song lyrics. You will find the same lyrics (mostly incorrect) for most songs. I think most lyrics sites start off by copying from other sites. So you end up with 30 sites all showing the same lyrics in the text of the page - but no duplicate filter applied that I can see.

Sorry that's not an answer - just an observation.

GodLikeLotus

10+ Year Member



 
Msg#: 27138 posted 12:06 am on Jan 3, 2005 (gmt 0)

I guess the same can also be said about DMOZ. Seems like a "mixed bag of clones" and nothing to do with PR.

tml89

10+ Year Member



 
Msg#: 27138 posted 7:31 pm on Jan 3, 2005 (gmt 0)

2) How much text is considered duplicate? The reason being, on one site that sells widgets we generally include info on the mfg on every specific widget page along with the info on the specific widget. We really can't come up with 2,000 different descriptions of the mfg just to avoid the dupe content filter but if we know where the line is, we can address that perhaps by making the individual product descriptions longer...

Im in this same position, would adding say 5-10 lines describing the merchant at the bottom of the page hurt?(assuming all other area are different)

Lorel

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27138 posted 8:00 pm on Jan 3, 2005 (gmt 0)

6. Late December 2003 - The 1 link in from the PR5 subject related site had a major problem with their host and subsequently lost all their pages in Google. The site was quickly moved to another hosting company and returned very quickly into the SERPS, back to where it was before the problems by Feb 2004.

Have you tried contacting the new host?

I would check their TOS (terms of service) first and quote them to the administrative contact for the new host and tell them this person has copied your web site (i.e., copyright infringement) which will be listed in any TOS. Also provide them with any communication you've had with their client and any other proof, i.e., evidence from Google search and also google cache, etc. I would also include information about their being "expelled" from their previous host. You may have to do this again if they change hosts again so keep all current proofs.

GodLikeLotus

10+ Year Member



 
Msg#: 27138 posted 8:52 pm on Jan 3, 2005 (gmt 0)

Lorel

Are you reffering to the copied sites host?

With regards to proof, The Waybackmachine clearly shows our pages existed before the copied sites domain was even bought. It also shows the links we actually had to the customer's own site before he stole all our work.

GLL

Philosopher

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27138 posted 9:12 pm on Jan 3, 2005 (gmt 0)

Have you submitted a DMCA complaint go Google? If you can show your site was live first which it sounds like you can, many people have had good luck getting Google to pull the offending site with a DMCA complaint.

A similar complaint to the offending sites host and possibly even registrar could also have good results.

GodLikeLotus

10+ Year Member



 
Msg#: 27138 posted 9:32 pm on Jan 3, 2005 (gmt 0)

Philosopher-

A couple of questions about DMCA:

1. Can I file a DMCA complaint if I do not own any copyright of the material.

2. I am also based in the UK, although the site in question is hosted and aimed at the US market, does this make any difference?

You also mentioned contacting the offending sites hosting company, is there a procedure for this?

GLL

Philosopher

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27138 posted 4:09 am on Jan 4, 2005 (gmt 0)

Well I'm definitely no expert as I haven't had to deal with any DMCA issues yet (knocks on wood), but I've seen a few posts in here dealing with it.

I'd check out

[google.com...]

For any DMCA questions you have.

As for the hosting company, I'd just send them either an email or a letter or both and outline everything citing examples from the wayback machine etc. to prove the validity of what you are saying.

I'm no lawyer so take my advice with a grain of salt. ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved