Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

What to do with this duplicate content issue?

         

DSEOConsultant

9:32 pm on Feb 29, 2020 (gmt 0)

5+ Year Member



Hello all,

I have an issue with my website and I am not sure what to do.

I have a webpage, let say it is www.domain.com and sub-pages
www.domain.com/service-one.html ;
www.domain.com/category-one.html
and products in www.domain.com/category-one/product-xyz.html .

And this suits the best my needs.

The site was meant for multiple languages on .com at first but I put all other languages on country specific domains:
www.domain.it
www.domain.de
www.domain.ru

The real issue I have now is that the system in which the site is built automatically created each URL that I have on multiple subpages:
www.domain.com/de/service-one.html;
www.domain.com/it/service-one.html;
www.domain.com/ru/service-one.html and other 2 languages

sames goes for caregories
www.domain.com/de/category-one.html;
www.domain.com/it/category-one.html,
www.domain.com/ru/category-one.html and other 2 languages
...
same goes with products. ....

The content on al /de/ /it/ /ru/ in in English and is generated automatically from a random combination of elements from homepage and the original content page in English (.com/service-one.html or .com/category-one.html or. category-one/product-xyz.html) using the same url + /DE/... with keywords and usually with the same h1 tag .

At first all /it/ /ru/ /de/ content was blocked in robot.txt but Google somehow index it, so now I need to make some additional changes.

I can not delete this /it/ /de/ /ru/ so I was thinking of 301 redirects on the main url from each /it/ /de/ /ru/ but I am scared that Google would think I am trying to manipulate the rankings, since the situation would be:

www.domain.com/de/service-one.html;
www.domain.com/it/service-one.html;
www.domain.com/ru/service-one.html
...
and 2 more would be redirected on
www.domain.com/service-one.html;

Sames goes for categories and products.

And because some products can get out of sales, I would have some urls of products with 10 redirects on each of them.

I am ranking services, categories and products.

The second option I am thinking of is to redirect all 3000 URLS from /de/, /it/ /ru/ and 2 others on a useful and not ranked subpage on .com domain in a way that even if google does not like this redirect just one useful sub page goes unindexed.

I do not like the idea of inserting noindex tags instead of redirecting this pages since the page does not need 3000 subpages that are useless.

What are your thoughts and what do you suggest?

Thank you for your answers.

tangor

1:25 am on Mar 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@DSEOConsultant ... Welcome to Webmasterworld!

A Leap Year Posting!

An observation (not an answer):

The content on al /de/ /it/ /ru/ in in English and is generated automatically from a random combination of elements from homepage and the original content page in English


... that certainly looks like duplicate content. Is there a reason the content for the country specific URLs is not in that language?

phranque

1:53 am on Mar 1, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], DSEOConsultant!

the best solution would be to link internally only to the canonical urls (i.e. without the /de/, /it/, etc) and 301 redirect any requests for the country-specific subdirectories to the canonical url.
by linking internally only to canonical urls you are providing an unambiguous signal vs relying on redirects to fix your internal linking.

another alternative is using link rel canonical elements, but note that these are used as hints while redirects are absolute.

you also should use <link rel="alternate" hreflang="x"> elements appropriately.

The site was meant for multiple languages on .com at first but I put all other languages on country specific domains

note that ccTLDs target a country or region rather than a language per se.

DSEOConsultant

1:07 pm on Mar 1, 2020 (gmt 0)

5+ Year Member



Hello @tangor and @phranque

Thank you for your welcome message at first. And thanks for your help.

@phranque

Anyhow I am thinking of absolute 301 redirects, for sure. Would you be so nice to explain a litle bit more in details what you meant with:

the best solution would be to link internally only to the canonical urls (i.e. without the /de/, /it/, etc) and 301 redirect any requests for the country-specific subdirectories to the canonical url.

I have the content I want to rank on www.domain.com/service-one.html ... but the system the site is build in is a little bit "f*** up" and generates an usless subpage also on .com/it/service-one.html and .com/de/... .com/ru/.... or if a new url is .com/category-one.html , the system add also .com/it/service-one.html and .com/de/... .com/ru/... URLS to the site - This URLS in /it/ /de/ /ru/ do not have the same content as the main URL on .com but uses the same url after /it/ /ru/ /de/ and same H1 tags. This pages on /it/ /de/ /ru/ looks like a broken home page combined with the H1 tag from the www.domain.com/service-one.html or .com/category-one.html and I do not like Google or users to land on this /it/ /de/ /ru/ because this subpages are useless.

If I get you right you suggest that at first I mark this urls on /it/ /ru/ /de/.... as canonical urls for the main one and then redirects them on the main one?

Thanks

DSEOConsultant

1:19 pm on Mar 1, 2020 (gmt 0)

5+ Year Member



my main concern is that if I 301 redirect all the content on /it/ /de/ /ru/... on the main URL, I will have:

www.domain.com/de/keyword-one.html;
www.domain.com/it/keyword-one.html;
www.domain.com/ru/keyword-one.html
...
and in some cases also

www.domain.com/de/keyword-one-random-word.html;
www.domain.com/ru/keyword-one-random-word.html;
www.domain.com/it/keyword-one-random-word.html;

redirected on www.domain.com/keyword-one.html; - so this URL that I want to rank will have 5 - 15 301 redirects from 5 - 15 urls with the same keyword in it and that Google will not like it that each urls has 5 - 15 internal pages with the same url + /it/ /ru/ /de/ redirected on the main one and will see it as a manipulation.

This is why I am thinking of a new subpage let say www.domain.com/our-story.html and redirect all this /it/ /de/ /ru/ urls on www.domain.com/our-story.html - and than linking internal from www.domain.com/our-story.html to this main urls / www.domain.com/service-one.html.... so I pass any possible link juice on category and service pages but I do not risk that google unindex my main service and category pages. In this option this main pages will not have any redirects directly on it.

tangor

3:14 pm on Mar 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What "system" are you using that generates these urls?

One more sub page might simply add more confusion.

G will find all the pages, if for no other reason than someone outside your control actually links to one.

If all the pages are in one language, regardless of the geolocation urls, is there any reason to do this?

Clearly I must be missing something or not exactly sure what you really want to do.

DSEOConsultant

3:59 pm on Mar 1, 2020 (gmt 0)

5+ Year Member



Lets call it a system I have no control over the activities generated in background ... and it is a problematic one :). Considering a new one in next 2 years but for now I have to find a creative solution ASAP.

G has found this pages already (they were blocked in robot.txt but from 1.9.2019 it is not a valid rule anymore and google index everything).

Yes I see you probably do not picture the whole situation, I had many problems to understand it too at first. To make the long story short, for every created page on .com the sistem generates the same urls also on .com/it/, .com/ru/, .com/de/ with the same H1 but different site elements and no content. So this pages just take the link juice and create duplicate content issues but have no value, are useless and needs to be redirected permanently.

My concern is where to redirect it, since if I redirect them all on the respective .com URL, I would have 10 - 15 redirects on each URL on .com and I am quite sure Google will do not like it.

phranque

12:39 am on Mar 2, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the system the site is build in ... generates an usless subpage

please describe what you mean here?

explain a litle bit more in details what you meant with:

the best solution would be to link internally only to the canonical urls

does "the system" actually link to these (country-specific subdirectory) urls anywhere in the content on your site?
if it does you should try to fix this problem.

my main concern is that if I 301 redirect all the content
...
so this URL that I want to rank will have 5 - 15 301 redirects from 5 - 15 urls with the same keyword in it and that Google will not like it that each urls has 5 - 15 internal pages with the same url + /it/ /ru/ /de/ redirected on the main one and will see it as a manipulation.

this should not be a concern.
this is exactly what google would prefer.
Avoid creating duplicate content [support.google.com]:
There are some steps you can take to proactively address duplicate content issues...
... use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, Googlebot, and other spiders.

Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects.

DSEOConsultant

1:15 am on Mar 2, 2020 (gmt 0)

5+ Year Member



Hello @phranque

thank you for your answer. Very useful.

1st. none of internal links link to any of /de/ /it/ /ru/... all of them are lined just on subpages on .com without /it/ /ru/ /de/. Actually I neither knew that the system is generating this duplicate urls, until I found them in Search Console. And from my experience and knowledge 301 redirects are the only way anyhow.

BUT! my concern is that Google will not like the following situation:

So the situation I have is exactly like this, I have my website content on;

www.domain.com/keyword-one.html;

When each subpage was created, the system generated also 5 additional subpages:

www.domain.com/de/keyword-one.html;
www.domain.com/it/keyword-one.html;
www.domain.com/ru/keyword-one.html
... and so fort for 2 more language.

I restructured already the website URLS a year ago with 301 redirects:

from www.domain.com/broadly-related-words-not-exacly-the-keywords-I-want-to-rank-for.html to www.domain.com/keyword-one.html

This are already 301 redirected on .com/kwyword-one.html and this is all done the right way.

But in last month I found out that all new and old subpages exist also in /it/ /ru/ /de/

So Now I would have for each URL I want to rank for (.com/keyword-one.html) 10 redirects or more on it

www.domain.com/de/keyword-one.html;
www.domain.com/it/keyword-one.html;
www.domain.com/ru/keyword-one.html
... and so fort for 2 more language.

+

www.domain.com/it/broadly-related-words-not-exacly-the-keywords-I-want-to-rank-for.html
www.domain.com/de/broadly-related-words-not-exacly-the-keywords-I-want-to-rank-for.html
www.domain.com/ru/broadly-related-words-not-exacly-the-keywords-I-want-to-rank-for.html

... and so fort for 2 more language.

MY CONCERN IS THAT GOOGLE WILL NOT LIKE THAT EACH PAGE HAS 10 REDIRECTS ON ITSELF FROM WHICH 5 USE THE MAIN KEYWORD AND 5 A BROADER ONE AND WILL DEEM MY REDIRECTS AN ATTEMPT TO MANIPULATE WITH KEYWORDS.

What do you think?

phranque

3:16 am on Mar 2, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



MY CONCERN IS THAT GOOGLE WILL NOT LIKE THAT EACH PAGE HAS 10 REDIRECTS ON ITSELF FROM WHICH 5 USE THE MAIN KEYWORD AND 5 A BROADER ONE AND WILL DEEM MY REDIRECTS AN ATTEMPT TO MANIPULATE WITH KEYWORDS.

What do you think?

i think that google will in fact not like the 10 duplicate pages but will love the 9 redirects to the 1 canonical url.
mostly google will love that there is a (i.e. 1) canonical url for that content.

i also think that you should not be concerned if google discovered these noncanonical urls on a site that is not under your control.
as long as you know where to redirect noncanonical requests, you are good to go.

since you are using apache to redirect these requests, "the system" will never see them and it is therefore irrelevant that it is capable of serving responses to noncanonical requests.

DSEOConsultant

6:13 pm on Mar 2, 2020 (gmt 0)

5+ Year Member



Thank you @phranque very usefully.

Following your points of view, reading everything on google support about 301 redirects and canonical tags, I come to conclusion that the best and also the easier thing to do is the following: (the text is copied from a webpage I found online)

Move Category to a Single Web Page

Unlike the previous lesson, let's say that I have a folder on my site: "example.new/cars/," and I'm getting rid of that entire folder and all of the pages within it, such as "/cars/ford.html," "/cars/toyota.html," "/cars/acura.html" and several dozen others. Since I'm no longer using those webpages, I'll want that entire folder and its contents to move to "example.new/new-page"

RedirectMatch 301 ^/cars/(.*)$ http://www.example.new/new-page


Move Category to a Single Web Page (with extension)

The redirect would be exactly the same as the example above, except you'd simply type the page extension at the end of it. This is mostly for anyone not using a CMS like Wordpress, where the pages all have extensions, like ".html" or ".php":

RedirectMatch 301 ^/cars/(.*)$ http://www.example.new/new-page.html



ANY OBJECTION?

[edited by: not2easy at 7:19 pm (utc) on Mar 2, 2020]
[edit reason] example.new for readability [/edit]

not2easy

7:16 pm on Mar 2, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Sorry for the edit, but it helps if others can read the rules without having a link hide part of it. Using "example" rather than "new-site" keeps it readable.

We are wandering into htaccess discussion which should be in the Apache Forum: [webmasterworld.com...] which might be the place to look for objections. There are a number of discussions there about getting this right because RedirectMatch is not the same (nor the same Apache mod) as might have been assumed here.

When mixing mods in your htaccess file you may need some guidance to make sure it does not break something else. To see what I mean, take a little time to see this thread: [webmasterworld.com...]

going back to edit for code tags...

DSEOConsultant

8:26 pm on Mar 2, 2020 (gmt 0)

5+ Year Member



Hello @not2easy

thanks for your editing and help Sir. I am reading the thread you mentioned, good stuff.

I am new here so still learning how to use this forum. Anyhow it is of great help.Do you have any thread on how to format forum posts?

Regards and thank you all!

not2easy

8:46 pm on Mar 2, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Sure, when you want to format a post, use the "Preview" button where you will find all the EZ tools for that. There should be a Preview button just below the textarea where you would compose a post. ;)

DSEOConsultant

8:55 pm on Mar 2, 2020 (gmt 0)

5+ Year Member



Great, thank you :)

Robert Charlton

9:15 pm on Mar 2, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Note... also in "Preview" mode, note that to the left of the preview window, there is a link to "Style codes"... for those who prefer to code them manually...

Style Codes
https://www.webmasterworld.com/help-v6.cgi?cat=ubbcodes [webmasterworld.com]

Note that there are some inconsistencies, and that some of these links got buried when the forums were revamped to be mobily Responsive.

Each sub-forum has its own Charter... guidelines for posting... hidden in the drop-down under "Forum Options". For the Google SEO News forum, the Charter link is...

https://www.webmasterworld.com/google/charter.htm [webmasterworld.com]

phranque

4:37 am on Mar 3, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RedirectMatch ...

if you are using mod_rewrite anywhere in your configuration, you should use RewriteRule directives here instead of RedirectMatch.

assuming you have a hostname canonicalization redirect ruleset in place, that typically implies mod_rewrite.

Unlike the previous lesson, let's say that I have a folder on my site: "example.new/cars/," and I'm getting rid of that entire folder and all of the pages within it, such as "/cars/ford.html," "/cars/toyota.html," "/cars/acura.html" and several dozen others. Since I'm no longer using those webpages, I'll want that entire folder and its contents to move to "example.new/new-page"

the situation you are describing should get a 404 or 410 response.
unless /new-page actually contains sufficient ford/toyota/acura/etc content to replace all of the several dozen pages being redirected...