Duplicate Content - My Specific Case - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Duplicate Content - My Specific Case

I was providing a service not thinking of SEO

alice

6:57 pm on Jun 20, 2003 (gmt 0)

10+ Year Member

I have gone back and read all the posts about duplicate content, but I still need a little help deciding what to do with my situation.

When I started to create my websites it was to create useful information to my visitors. I wasn't thinking of SEO. I have an information site where people can go and read informative articles on my topic.

I also have a writing site where I freelance my services and I also offer free content articles. I know all of you are going to GASP when I tell you that I have taken a selection of the articles from the information site and put them on my writing site. My writing site visitors can reproduce them for their websites and newsletters.

My information site has a higher ranking than my writing site and many of the articles are in the top 5 on Google for the keywords I've targeted. The articles on both sites have been indexed by Google. They don't seem to have penalized me that way. I've been doing this for nearly a year now.

Now, I need to make a decision. I get a decent amount of traffic to my writing site just for the free articles and I really don't want to remove them. I just want to provide a service (and, of course, get my byline out there)...I'm not trying to load Google with duplicate content.

From reading the previous posts on duplicate content, it looks like people are saying that if Google finds the duplicate content...it will drop the content from the lower ranking site. Is that my biggest risk, having individual article pages dropped...or are there bigger ramifications for my sites as whole? I still have plenty of content outside those articles that appear on both sites.

Jenstar

4:03 am on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

With this most recent update, there is clearly a new duplicate content filter in place, but we are still waiting for it to be finalized. GoogleGuy hinted there were still changes being made to the algo for the dup content filter. I have seen cases where the higher PR site is the one penalized, and also where the older site of the two is penalized, and the new page with duplicate content is the one showing in the serps.

You also need to take into account how much differing content is on each page - do you have a significant difference overall with the non-article portion of the page? Or are they both pretty similar? If they are similar, I would be more concerned.

What can you do? If you are concerned about duplicate content between your own two sites, you should disallow googlebot from indexing the specific articles on one site or the other through either your robots.txt file or through the noindex meta tag. You mention the articles rank high - is this on one site alone, or both? You should obviously allow googlebot to index the one that ranks well. And take into account if the articles on site A have a PR5 but the identical articles on site B have a PR0 (which could indicate that Google is aware it is a duplicate).

You could also rewrite the articles for reprint, so it is actually a different, although similar, article on each of your sites. Don't just switch up the sentences, though. No one is yet sure exactly how this duplicate content filter is going to play out in regards to what is and isn't considered a duplicate.

Or you could do nothing and hope for the best, since you have not been penalized for it yet.

As for an entire site being penalized for only a portion of the site being duplicate content? I think we will all have to wait for this update to settle down first, then we will be able to take a clearer look at how this new duplicate filter works.

alice

4:25 am on Jun 21, 2003 (gmt 0)

10+ Year Member

Hey Jenstar,

You are everywhere...always being so helpful.

I was considering the robots.txt on the writing site as the articles rank lower. However, whenever someone uses the article, I run a risk. Still, most people use it in email newsletters. Although my articles get used quite a bit, it's not as though one particular article is duplicated a ton of times.

This is food for thought. Thanks!

Jenstar

4:44 am on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

If you plan on using the robots.txt on the free article site, yet want to have the same article appear on your other site too, rewrite the articles you offer for free. That way you don't have to worry about a possible duplicate content penalty, because Google won't index the ones on your site that could appear elsewhere.

This seems to be the simplest solution (if you are up for rewriting, that is.) If not, you would still run the risk of a possible duplicate content penalty whenever those articles get republished, if they appear in their original form on your non-writing site too.

For now, I would wait and see what happens when this update finishes, and then take action. If you haven't tripped the duplicate content filter, you might be fine in the future without making the changes. But on the other hand, I have seen this filter catch established sites where there are only one or two duplicates, so it is an area of concern for people right now.

chrisnrae

5:01 pm on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"From reading the previous posts on duplicate content, it looks like people are saying that if Google finds the duplicate content...it will drop the content from the lower ranking site."

People may be saying that, but it is certainly not what is happening. We are not sure what method of madness google is using to decide what is the duplicate or what is the original. I had an extremely high ranked site that was over a year old get copied by a six month old domain as a doorway page, yet my site was the one that received a semi-penalty.

We won't have any facts about any of the new additions to the algo until after the update. All we have now is speculation and personal experiences which highly vary.

alice

7:25 pm on Jun 21, 2003 (gmt 0)

10+ Year Member

Thanks. I'm going to start with the robot.txt first and then move from there. If you have just a second, is this correct?

User-agent: *
Disallow: /articles/

Will this exclude all robots from my article directory? Or should I just exclude googlebot?

Okay, it's Saturday. I'm going outside! :)

Jenstar

10:40 pm on Jun 21, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You should use the robots.txt validator [searchengineworld.com] to check your file, to be sure it is validating correctly.

You can disallow Googlebot from your entire site or just from the specific directories you do not want Googlebot to index.

anallawalla

1:33 pm on Jun 22, 2003 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Alice,

I'm the honorary editor of my user group's magazine and much of the print edition is placed online. Other nonprofit user groups can freely reprint the articles (with proper attribution) and I expect that the same article appears on other sites from time to time. We do not place reprinted material online, but that policy predates Google. It is simply a matter of displaying material written by our members who have given us a signed release to reprint online.

Just on that point alone, I can say that I neither worry about reprints (because we have swapped content with other UGs for over 12 years), nor PR. The magazine's subdirectory is PR6 and many articles are PR3 or higher.

The robots.txt approach is fine. Just cater to your audience at both sites.

- Ash

john22

3:22 pm on Jun 22, 2003 (gmt 0)

10+ Year Member

Hi

I have a single site with pages which have a similar layout and content as the site necessatates this...

So that i know whether i will receive a 'penalty' (PR ranking?)... How does google determine whether these pages are duplicates? i.e. percentage of similarity?

Regards,

-Martin

DerekH

4:53 pm on Jun 22, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I've seen an interesting side effect which is not really duplicate content, but it ties in strongly with it...

One of my sites uses a free ISP, with a URL forwarding service on top so that I can have my own domain name.

The forwarding service provides a frameset which conceals the URL of the free ISP, and fetches the content.
For a no-frames browser (or a spider), there is a meta refresh command to the free ISP.

In the past, the "chosen" domain name appears first in 600,000 in the SERPS for my two-word look-up, (and the snippet contains the meta-refreshed page, which is rather neat). The free ISP site has appeared second, with exactly the same snippet.
Entirely expected, since the chosen site has more backlinks from outside.

Now Esmerelda has swept past, the free ISP site doesn't appear anywhere in the top 50 listings, but the domain name site is still top. Which is rather nice.
Better still, the other site is still in the index - it just lurks a *lot* further down the SERPS.

Since this is entirely what I'd like to enquirers to see, I have to say that the new weighting scheme is rather good.

DerekH

g1smd

5:23 pm on Jun 22, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

With a similar setup (but different in several ways) I had results that were #1 for the real site, and #2 for the free ISP site which just pointed back to the real site. I wanted the real site to be top of the listings.

However there were times when the free ISP dropped to #3 or #4 and occasionally as low as #14, but the real problem was that the real site dropped to #60 or #64 or so at the same time. That really was not what was intended! The drop has happened 4 or 5 times in the last two months, but each time, the drop was for only 2 or 3 days; except for the last time it happened, when it was for nearly a week on some servers.

The last time it happened was at the very beginning of the latest update (2003-06-15), and it instantly dropped position on all 9 datacentres; but then, surprisingly, it came straight back up on just one datacentre (on -fi I think) within a matter of hours. It then proceeded to come back up to the top on roughly one more datacentre each day, since that time. Now that the update is over it is back up at the top for all 9 datacentres, and -in was the last to go.

Monkscuba

6:13 am on Jun 23, 2003 (gmt 0)

10+ Year Member

The duplicate content filter certainly seems effective, and also seems to detect changes quickly.

We had a mirror site up with hardly any links to it. The mirror was done before I started working here, and being an amateur anyway, I never thought much of it. Unfortunately, at the last update Google decided to pick up the mirror and dropped the index page and some other pages of the real site. So we took the mirror offline, onto a new server and have just made a couple of new pages. Today I see www2 Google has picked them up and the main site is coming up in the results again. Not all results, but some. I can hope this moves to www. It only took a few days for these changes to be picked up.

I would advise people with mirrors to beware. Google may pick up the one you don't want! It is just a robot and doesn't know which one you want shown.

g1smd

6:48 pm on Jun 23, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I agree with that, though you can fix some things by putting <meta name="robots" content="noindex,follow"> on all of the pages that you do not want to be indexed.

I just helped someone move their site to a new domain. They haven't got many incoming links, but were already listed in Google, AltaVista and a few others. The new site was put up. Google wasn't at all interested in it. Freshbot made one visit and never returned. The old site was then modified to put the <meta name="robots" content="noindex,follow"> on all of the pages that we did not want to be indexed. The site navigation on the old site was then changed so that it all pointed to the new site. The first click that any visitor made anywhere on the old site would take them to the required page, but that page would be the new version, on the new site. All further navigation would keep them on the new site. It took a few weeks, but on the latest update Google dropped all of the pages of the old site, and listed all of the pages of the new site in their place, and ranked them a lot higher too.

philipp

7:10 pm on Jun 23, 2003 (gmt 0)

10+ Year Member

I have a public domain full text book site with pretty much only duplicate content, however I still get indexed fine and get visitors coming from Google. Apparently it matters in what chunks content is broken up into, how good the structure is (HTML headings etc.), how often the book title appears, and so on.
So if you got content that's a duplicate, make sure it's the best version out there!

I also fall into the duplicate content trap for some books, but in general, if searching for a specific phrase contained within the book, people can find me. And since I think my site is of actual use to them (better layout, easy to read, no clutter) I'm happy, and believe it's fair, that it's not in any way penalized by Google. The first thought for you should be; does it help your visitors? If so, you are somewhat more on the safer side.

As additional suggestion, you might want to create additional content on every page which has one of those free articles; that way, you're not just replicating what's on the web. Now I wonder wether or not Google takes into account page-rank as well for duplicate content ranking? My site is so new it has a page-rank of 0 and still people find me via Google.

john22

1:39 pm on Jun 25, 2003 (gmt 0)

10+ Year Member

Hi

I have 2 pages on my site that one indented below the other on google.
What does this mean? Duplication or just similar sites?

-Martin

Jenstar

3:14 pm on Jun 25, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I have 2 pages on my site that one indented below the other on google.
What does this mean? Duplication or just similar sites?

This is a good thing :) If you have two results showing on the same search page (say, a #2 listing and a #7) it will take the one that would appear as #7 and bump it up to appear indented underneath the #2 listing.