Articles re-used - penalized for duplication - (deprecated) Google News Archive forum at WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Articles re-used - penalized for duplication

An affiliate granted me copies of many of their articles, Google punishment

adfree

11:47 pm on Oct 4, 2003 (gmt 0)

Running an affiliate site where content is pretty much legal and legislative content. The partner offered re-using their articles but I would not be allowed to change anything in their copy.

Glad for the extra copy I stuffed the articles in, just to find out now that Google penalizes me with a 0 page rank.

How would you react? What's your take on re-used copy that you are not allowed to change?

Thanks, Jens

Jenstar

3:06 pm on Oct 5, 2003 (gmt 0)

Is the PR0 directly related to the duplicate content? Or is it just a PR0 for a new site? Are you sure those articles have been fully indexed?

If you think it is the duplicate content filter at work, you can see if it has been flagged for this by searching for a specific phrase on the article, with quotes around it. Choose a phrase that is likely to be in this article only, and use most of the ten words Google will search for at once.

In the initial results, is your site's article there? If yes, it isn't the duplicate content filter working against you.

If not, click the link at the bottom of the results that says:

In order to show you the most relevant results, we have omitted some entries very similar to the ## already displayed.
If you like, you can repeat the search with the omitted results included.

If your site then shows up when you click omitted results, that would be the duplicate content filter kicking in. If your site still does not show, it likely hasn't been indexed fully yet, and it isn't the duplicate content causing your problems at this point.

bether2

3:28 pm on Oct 5, 2003 (gmt 0)

Are you getting a PR0 only on the pages with duplicate content, on the home page, or on all pages on the site?

Beth

dirkz

4:10 pm on Oct 5, 2003 (gmt 0)

When did you put the new articles online and which PR has your main page (or the page that links to the articles)? This is crucial because chances are really that the content is just to fresh to get something different than PR0.

Have your new pages been crawled?

It's clear that you can't change anything in the copy, but in most cases you wrap the article in your templated navigation. Adding some comments before the actual article and a summary could help to prevent duplicate content.

adfree

7:19 pm on Oct 5, 2003 (gmt 0)

Thanks All, Jenstar - I will follow your advice and post results here tomorrow.

The site has been produced in July, content grew over a period of two months, haven't done any additions after early September. The site was indexed (with 10% of pages available back then) end of July.

All pages show up in index now, all have PR0.

Many thanks again, Jens

buckworks

8:01 pm on Oct 5, 2003 (gmt 0)

I had a similar situation, adding a bunch of licensed articles to a new site. Many pages still have PR 0, but I don't think there's any penalty, I think it's a "natural PR 0" just because they're so deep in the link structure. I'm confident that as I build PR to that section of the site, the PR 0 pages will become PR 1, etc.

I didn't change any content, but I made changes in other ways, including introductory comments as mentioned by Dirkz. I also converted most formatting to CSS to reduce clutter in the source code. It speeds up the pages, and it's one more way to increase the distinctiveness of my pages.

[edited by: buckworks at 8:07 pm (utc) on Oct. 5, 2003]

birdstuff

8:05 pm on Oct 5, 2003 (gmt 0)

IMO, the best way to handle affiliate content is to simply write the page yourself using the one provided by the affiliate program as a guide. Read it, then completely rewite it in your own words.

There are 2 advantages to doing it this way:

1 - You avoid any chance of getting a duplicate content penalty.

2 - Your affiliate commissions will likely be higher than your competitors because you have unique content on your page, not the same boring stuff that your visitors have already seen on 50 other sites.

buckworks

8:14 pm on Oct 5, 2003 (gmt 0)

Birdstuff, your advice would work well for regular sales pages, but for something like an authoritative 2000 word article on a technical topic (multiplied by a whole library) it's not so suitable.

It's time-consuming, for one thing, and plagiarism questions arise when you try to rewrite someone else's content. Also, the credentials of the original author are part of what makes the material valuable to have on your site.

It's probably more efficient to make reasonable changes to differentiate your pages from possible duplicates, then work on building PR so that even if there are partial duplicates floating around in cyberspace, your pages will be weighted better than theirs.

PatrickDeese

8:22 pm on Oct 5, 2003 (gmt 0)

Have you thought about breaking the article into two pages?

G usually doesn't mind dup content, as long as the pages don't "taste" the same. CSS, layout, and page architecture can help with this.

birdstuff

8:31 pm on Oct 5, 2003 (gmt 0)

buckworks:

You're right. I was referring to the cookie-cutter pages used to sell ebooks, software, things like that. These things are ubiquitous on the web and pretty much ignored.

In reference to plagiarism, I believe that completely rewriting a page and not using any passages from the original page would help prevent being accused of plagiarism more so than changing bits and pieces here and there.

If no part of the content is copied (or simply altered and used)it can't be plagiarism. Knowledge and facts can't be copyrighted, only specific written text. For example, lets say I read a book about a topic, say on using a software package. I then write my own book on that topic. Assuming that I write the book in my own words and don't use altered versions of passages from the other book, this isn't plagiarism. Am I right or am I completely confused on this issue?

buckworks

9:14 pm on Oct 5, 2003 (gmt 0)

Assuming that I write the book in my own words and don't use altered versions of passages from the other book, this isn't plagiarism.

You're right about that, because the work would be your own. But I stand by my point about the time it would consume! ;)

Maybe the best of both worlds would be to get copyright permission to use content from other sources on your site, and also do some original writing of your own as time and knowledge permitted.

Side note: When I'm working on an article, I try to leave some downtime between the research phase and the writing phase, to reduce the chance of echoing someone else's words too closely because they're too fresh in my mind. Giving credit where it's due can become difficult, because over a lifetime of reading we accumulate ideas and information but often don't remember where things came from.

[edited by: buckworks at 9:21 pm (utc) on Oct. 5, 2003]

birdstuff

9:19 pm on Oct 5, 2003 (gmt 0)

Yes, you're right about the time issue. I just wanted to make sure I understood the plagiarism issue correctly. Thanks for the clarification.

DroffatsX3

12:51 am on Oct 6, 2003 (gmt 0)

>>>If no part of the content is copied (or simply altered and used)it can't be plagiarism.

That's not exactly true. It is much more complicated than that. Here is a decent source showing examples.

[indiana.edu...]

cherrytron

4:34 am on Oct 6, 2003 (gmt 0)

I agree that Google does not penalize for duplicate "content"..

Unless you mean that the entire page (duplicate "page") is exactly the same (HTML source) as some other site's page (maybe by file size, modification date, etc?) then it might penalize you (but im not even sure about this)

"G usually doesn't mind dup content, as long as the pages don't "taste" the same."

The taste thing might be hard to prove also .. even with different headers and footers etc if the main article provides the "taste" of the page then this cant be true ..

Otherwise .. goodbye Yahoo and any news or press site that publishes news feeds and press releases word for word.

Ive also read on this forum that the big G "takes the first version of the content it finds and does not list the other sites" ..

Again .. if this was the case .. why should the first site to publish (or first to be crawled) a news article / press release be the one that Google indexes?

dirkz

7:18 am on Oct 6, 2003 (gmt 0)

I agree that Google does not penalize for duplicate "content"..

Maybe we shouldn't say this too loud :)
This could be the return of SEO "traditional style" ;-D

adfree

8:14 am on Oct 6, 2003 (gmt 0)

Many thanks for the ongoing discussion, provides quite a couple of important directions how to avoid issues and build a better site.

I will try to break up the articles into pages and sections, play with h1-h3 headers in between and quote passages in between the article as well. Introduction might help too.

I noticed that GBot only shows up occationally and does not hit deep down, just scratches the site on the surface, need to do some checking here as well.

Thanks all, nice week, Jens

adfree

8:19 am on Oct 6, 2003 (gmt 0)

Jenstar - it shows in both results, the regular when looking for the inquoted article passage and the omitted...what could that mean now?

Thanks, Jens

dirkz

8:39 am on Oct 6, 2003 (gmt 0)

This means that the filter wasn't active here :)

Nevertheless you should change the articles like you mentioned, just to be ready for future things.

Maybe you should also concentrate on getting much more incoming links, so that more of your pages get indexed and you get rid of that PR0 fast.