homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

Is their really a penalty for dup content?
Or do they just remove 1 of them and leave the other with higher PR?

 6:38 pm on May 13, 2003 (gmt 0)

Is their really a penalty for dup content? I was under the impression that Google just removes the page with lower PR. I have about 350 pages in my site and although none of them are the same, some do have similiar content (it is a review portal for several hundred sites in our industry)



 6:46 pm on May 13, 2003 (gmt 0)

I have several articles of mine that have been reprinted elsewhere. Most have been up for some time with no problems. They all link back to my site so could be easily found.

Now a duplicated site might be quilte another matter.


 7:00 pm on May 13, 2003 (gmt 0)

I have also heard that it is not neccessarily the one with the lower PR, but the one that has been in the index longer that Google will keep.

That said, I also have other articles reprinted that also appear on my own site, and haven't had any problems with them being removed for duplicate content, unless some of the reprints have been removed for that reason (but quite a few are indexed).


 7:03 pm on May 13, 2003 (gmt 0)

True duplicate content is not a problem. Google sorts it out just fine. It is near duplicate content that can end up getting penalized.


 7:07 pm on May 13, 2003 (gmt 0)

Hi WebGuerrilla:

Could you explain to me what is considered as "near duplicate content"? One of my client's editor stupidly enough copied over 4-5 pages of contents from an American website, and it seems the site has been filtered out for this update. Is this fatal? Or there is something I can tell them to do to get back into Google? Thanks.


 7:11 pm on May 13, 2003 (gmt 0)

Here is a really good one.

While working on a domain, I put up an affiliate program's full page ad, with a very slight amount of customization.

When I do a cache view on that page, I get ANOTHER site's cache of the page, not mine. My page is greybarred now.

Apparently it isn't just 100% duplication on your own sites, but near duplicates of anything else out there.

Anyone else see this?



 7:23 pm on May 13, 2003 (gmt 0)

Hi Alex:

Can you tell me is it your entire sites now is grey bared or just that particular page which had nearly duplicated contents?

I am supecting google is tightening dup content fitler for these few updates...


 8:01 pm on May 13, 2003 (gmt 0)

but what is the penalty...will one of the pages get booted or will both suffer?


 8:04 pm on May 13, 2003 (gmt 0)

My understanding is one of them will suffer, the later one (or the lower PR one). the original will be safe. What I don't know is what happens to the duplicating site, whether the entire site get a penalty or just the page.


 8:10 pm on May 13, 2003 (gmt 0)

Anyone else see this?

I wouldn't chalk that up to duplicate content filters just yet. During update periods, it is quite common for Google to show the a different page for the cache. It is also quite common for the toolbar to not function properly during an update.

Could you explain to me what is considered as "near duplicate content"?

Near duplicate content is duplicate content that has been altered somewhat in order to hide the fact that it is duplicate.

True duplicate content is a naturally occuring phenomenon. Multiple domains set up, mirror sites for bandwidth issues, etc..

With these types of situations, Google's goal is not to penalize people, it is simply to make sure that two exact copies of the same page don't show up next to each other in a SERP. Typically, in the past, this has been handled by "the highest PR wins."

Penalties (as in an entire site gets a PR0) is a different story. These usually involved sites who have intentionally duplicated content across multiple domains in order to try and gain advantage in SEPS. These pages are usually slightly altered in order to convince Google they are different.

[edited by: WebGuerrilla at 8:27 pm (utc) on May 13, 2003]


 8:11 pm on May 13, 2003 (gmt 0)

A competitor of mine uses 2 pages of my site in an invisible frame. He ranks above me in the engines using my content. Google must see these as 2 different sites because my site ranks well also. If I owned both domains it would be spam for sure.


 8:18 pm on May 13, 2003 (gmt 0)

It is my possibly incorrect understanding that duplicate pages are simply filtered out, and that it is only substantially duplicate sites (possibly combined with other factors) that will be penalized.

It is also impossible for Google to compare every page directly to every other page. They must have an algorithm set up to compare pages on similar sites.

What I don't know is if the PR from those sites is passed on before they get filtered.


 8:18 pm on May 13, 2003 (gmt 0)

I believe it is the duplicate content that Google finds after they index the original that would be removed. This helps protect those who get their work stolen by copyright infringers, to ensure they are not penalized for the theft by others.


 8:18 pm on May 13, 2003 (gmt 0)

pigsfeet -

thats terrible...I had a competitor steal 197 pages of content from my site and change the words around a bit and in this last update, they shot up in the rankings....I called them about 10 times and everytime I call and ask to speak with the domain name owner I get "oh he's not here right now, sorry"


 8:22 pm on May 13, 2003 (gmt 0)

Bummer.. Yeah my case is a little different. At least this guy is making me money by sending me traffic. The downside is they bookmark his url. The competitor that rips my flash movies and makes money off of them is the one i want to slash his tires.

About the guy re wording your content.. Umm i'm sure no one here has ever done that before <grin>


 8:23 pm on May 13, 2003 (gmt 0)

pigsfeet -

yeah...but 197 pages of content...and he even left the title and <h1> the same


 8:41 pm on May 13, 2003 (gmt 0)

ariff44 - you should be sending their legal dept, or domain owner, a nicely worded cease and desist letter to get them to remove the content. I can't believe they thought they could get away with stealing 197 pages worth. With a cease and desist letter, you should hopefully be able to get them to remove your content promptly - and if not, they could face legal action.

Also, if I had stolen 197 pages from someone, I'd be certain to say the the domain owner wasn't there either, so chances are, you were talking to him/her - or the person answering the phone was instructed to screen calls.


 3:50 am on May 14, 2003 (gmt 0)

Actually, send a C&D to the domain owner and the hosting company provding service. Make it a notice under the DCMA, and they are obliged to act. If the domain owner won't take it down, the host will have to or face being part of the copyight violation.

It's not alot of fun to do, but putting it in writing is a good way to get people's attention.



 5:24 am on May 14, 2003 (gmt 0)

yes google catches pure Duplicate content easily and grey bars the one which is new. Seen it personally for a client site. Also seen a case when only certain duplicate pages were grey barred but the rest of the site(non duplicate content) was left unscathed.


 3:34 pm on May 14, 2003 (gmt 0)

This topic is becoming popular:

The big issue here is that the penalty is not a PR0, but a gray bar (removed from the Google index)


 3:39 pm on May 14, 2003 (gmt 0)

Here is another:


 5:46 pm on May 14, 2003 (gmt 0)

thanks for your help...but how do I find what host the offending site is on?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved