Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Similar content being recognized as duplicate - penalization and SERP

         

Rina

8:23 am on Feb 4, 2016 (gmt 0)

10+ Year Member



Hello!

I'm having a problem indexing a site with around million pages (stuck at 300K).
One thing I suspect is that I have been penalized for duplicate content -
I found two pages (products) of very similar content being scraped by google
(googlebot has visited the indexed page around 19 times) but the results
do not appear on SERP. Am I on the right track with this suspicion ?

Thank you

aakk9999

11:34 am on Feb 4, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello Rina and welcome to WebmasterWorld!

I am presuming that technically all is fine (robots.txt, meta robots and canonical are not preventing the page from being indexed).

There is a difference between a page not being indexed and page not being shown in SERP. How are you checking whether the page has been indexed? Have you tried to search for an unique text in quotes from that page and see if it shows up? If it does, then the page is indexed, but it may not show in SERPs for a query for various reasons: competition, being filtered out as a duplicate, etc.

A website with one million pages is a very large site. You mention product pages so I presume this is an eCommerce site. Are you taking a manufacturer feed for product pages? I.e. could there be other websites that have exactly the same content?

You mention "very similar content" between your own two product pages - how much content is different? If the product is the same but either just the size or the colour is different, then I would not be surprised if these pages are filtered out.

Storiale

4:35 pm on Feb 4, 2016 (gmt 0)

10+ Year Member



site:domain.com will tell you how many pages are indexed.

Webmaster Tools/Search Console Indexed Pages Tab for us in totally incorrect. It shows 3 million pages and growing every day - this is erroneous. site:domain.com is much more accurate.

Does each product page have a canonical to itself? How do you handle variants? Like aakk999 mentioned, if only size and color are different, then this type of info needs to be snuck into Title Tag like Amazon does: Product Name + SKU/Model Number + Color/Size. That gets you unique Title tags even for variants of very similar pages.

Same for H1 tags and Meta Description. Add those very unique info pieces after the Product Names and this will help get those pages indexed if Google is seeing them as duplicates.

Much more, but I hope you're doing these or can implement them soon. Let us know! :-)

Rina

10:09 pm on Mar 2, 2016 (gmt 0)

10+ Year Member



Hi! thank you for your replies and a welcome :)
You are right, It is an ecommerce and we are not taking the manufacturer feed.
We are mainly checking through site:domain.com and using serpbook to track the changes.

As far as it comes to title tags; thanks for the advice - some product pages do have such title tags, but
we haven't arranged it for all pages, plus title tags have complex model numbers, but that might be another issue.
Can title tags, H1 tags and meta description have a same keyword included?
i will let you know as soon as we try it out, thanks! :)