Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Removing duplicated content using only NOINDEX for large sites

         

LukasTheCurious

1:39 pm on Oct 13, 2015 (gmt 0)

10+ Year Member



Hi everyone,

I am taking care of the large "news" website (500k pages), which got massive hit from Panda because of the duplicated content (70% was syndicated content). I recommended that all syndicated content should be removed and the website should focus on original, high quallity content.

However, this was implemented only partially. All syndicated content is set to NOINDEX (they thing that it is good for user to see standard news + original HQ content). Of course it didn't help at all. No change after months. If I would be Google, I would definitely penalize website that has 80% of the content set to NOINDEX a it is duplicated. I would consider this site "cheating" and not worthy for the user.

What do you think about this "theory"? What would you do?

Thank you for your help!

yerbaMat

5:36 pm on Oct 13, 2015 (gmt 0)

10+ Year Member



No index should work fine. How long does it take Google to review 90% of your pages? That might be the delay. I run a much bigger site than yours (millions of pages) and can tell you that noindex works fine for indexing +50% of pages.

In your case, I agree with noindexing 80% of the URLs. Additionally, I'd investigate where else Panda could hit you. What I've found is as you deal with a significant duplicate content issue other Panda related elements artificially hold down the rankings on even the good content. I've no indexed sites at the level you're discussing and seen partial recoveries but to get to the full Panda recovery it took additional analysis after removing millions of pages from the index.

Also, you should see your index status reducing in Search Console. I'd confirm that first thing.

LukasTheCurious

6:41 pm on Oct 13, 2015 (gmt 0)

10+ Year Member



Hi yerbaMat, thank you for your response.
About the delay - yeah you are probably right, but we got the first major panda hit almost a year ago. Traffic dropped to 3%.

Maybe I should describe the situation in more details: I agree that setting duplicated content to NOINDEX could work well - if we stop promoting that content and adding additional duplicated content. Right now, our editorial team is still adding content, which 80% is duplicated. Links to these pages are in the Latest articles, Menus, Other articles, Category pages.... . What do you think, could this be a problem?

In search console, index is showing drop form 500k to 70k pages.

Also, there is no canonical link pointing to the original source of the content - so google can be actually considering our web stealing the content.

Thank you for your opinion!