Welcome to WebmasterWorld Guest from

Forum Moderators: Robert Charlton & goodroi

How to handle canonicalization of duplicate pages

8:20 am on Jul 11, 2019 (gmt 0)

New User

joined:June 19, 2019
posts: 3
votes: 0

Hi, i want to describe my experience with a canonicalization of duplicate pages in the ecommerce website i am managing. It has around 200.000 pages

I discovered that the site has two versions of each page: one with the .html extension and one without. Both were indexed on Google SERPS with a self canonical tag. So, following the SEO basic knowledge to avoid duplicate content, my idea was to canonize a version to the other, in order to get higher ranking and avoiding page cannibalization.
So, finally we did it.
Both versions are still online (my boss don't want to remove one of the versions, even if it impact on our crawl rate), but we canonized the without.html pages through the .html versions (even if the non.html were better positioned in SERPS).

After 6 months, the pages without .html extensions are not anymore indexed, but the .html pages didn't increase positions and traffic. So basically we just lost the traffic coming from non .html pages. I think that this behavior is very strange.

So, I have 2 questions:
1. Did I act in the right way a SEO expert would do? In few words was my decision in line with SEO principles? (I think so, but Google didn't appreciate it)
2. Can we get back and auto canonize both versions of the pages to try getting the lost traffic? Should this reverse action impact more the indexation of the version we previously chose?

Thank you in advance for all your anwers.

1:39 pm on July 11, 2019 (gmt 0)

New User

joined:July 10, 2019
posts: 1
votes: 0

Hey Turixx,

Personally I would say the .html pages should have been redirected to URL without .html using htaccess, so you just avoid the duplicate content all together. Incoming links on one version also generate value for next page, visitors who have one version saved in their bookmarks also just get redirected, ...

It also gives benefits beyond just SEO. Now in Analytics when there are both the .html and the nonhtml version, if you do e-commerce tracking you can't immediately see bestselling products. For example: /redshoe.html 10 sales, /redshoe 10 sales is higher than /blueshoe.html 13 sales, /blueshoe 2 sales.

The choice to use .html version or other version is personal, but I think URL's look cleaner (and shorter) without it, so readability, the chance of people sharing etc gets higher.

I would definitely not try to regain that traffic by reintroducing duplicate content. But to make that decision we would need more information: what is the degree of traffic loss?

Some basic assumptions:
- When there were incoming links to products (no clue if there are), were they to the .html version or not? If they are mostly to one version, I would maybe keep that one and redirect other version to it. If it's "comparable", I would keep version without .html for reasons mentioned above.
- Are you sure loss of traffic is not seasonality related?
- There is no difference in pagespeed between .html and nonhtml pages?
3:05 pm on July 11, 2019 (gmt 0)

New User

joined:June 19, 2019
posts: 3
votes: 0

Hi BartWarrot,
Thanks for your reply.

The choice between one version against the other was dicted by my boss. I know that non .html pages are more likely to be user-friendly but finally he pushed to do in this way. I thought that the benefit of having one canonical version should have raised the traffic the same. The two version are totally identical a part the url extension.

The loss of traffic is about 1000 sessions/day and loss of revenue around 1000 /day. It's about 25% loss on the organic traffic. The traffic loss comes for 50% from the missed traffic from non .html pages.

By the way i think that the solution suggested by you at the beginning of your comment (redirect via .htaccess or other as we are on Nginx and doesn't have .htaccess file) should have been better.

Thank you.
4:39 pm on July 11, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
votes: 244

It might not help, but , to me, the best is to have a single page , and to redirect the other one using 301, AND to verify that your sitmap is referencing only one version of the URL AND, that all your internal links are also pointing to this single version.

From my, now already old, souvenir of Nginx you have both rewrite and redirect directives : [nginx.com...]

If you have the skill and possibilities, you can also test the URL in PHP and redirect them to the propoper URL.
1:17 am on July 12, 2019 (gmt 0)


WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
votes: 211

welcome to WebmasterWorld [webmasterworld.com], BartWarrot!
1:19 am on July 12, 2019 (gmt 0)


WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
votes: 211

i would have suggested redirecting the .html version to the non-html version as did BartWarrot.

since you already lost that battle, you should now instead redirect the non-.html version to the .html version of the url.
1:25 pm on July 12, 2019 (gmt 0)

New User

joined:June 19, 2019
posts: 3
votes: 0

Thank you Dimitri and phranque. OK i'll try to get the redirect done from non .html to the .html pages.
6:21 pm on July 12, 2019 (gmt 0)

Full Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 291
votes: 132

i agree with what others have said and that you need to stick to one version and redirect (I prefer the non html one but it doesnt matter seo wise)..

Personally I think ensuring that you have web server level redirects in place that ensure that you have just one url structure is the ideal. By that I mean have everything going to a [domain.com...] type structure redirecting all http to https , ensuring there is only /www.domain and not /domain.com (or the other way around - again personal choice) and as discussed choosing just the one url ending of .html or without.
Of course this requires that you are using SSL/TLS certificates and serving https but everysite should be doing that now as that can have an impact on SEO if your not also.

Once everything is setup and stable I would consider implementing hsts and submitting to the hsts cache (it takes about a week to get accepted) - I have found that gives a performance boost on the site also.