Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Some help with unwanted duplicate URLs

         

shaunm

9:24 am on Feb 19, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi All,

I often see two versions of URLs which is mostly because of the capitalization issues.

Like, example.com/product.aspx and example.com/Product.aspx

But the later one has canonical in it pointing to the first one in most of the cases.

I'm not sure whether I need to worry about it all. Because Google going to only index(yes) the first URL because of the canonicalization and there is no chance of duplicate issues.

So from my understanding, there will be link juice/authority issues if both versions are getting internal/external backlinks separately. But still what are the other problems that might show up?

Thanks,

weiskopf

10:20 am on Feb 19, 2014 (gmt 0)

10+ Year Member



Hey Shaun,

It goes deeper than that. Case sensitive URLs on .NET sites cause content duplication issues and many other problems.

I've been in there before, and you can fix this rather easily (and you should).

What you need to do is to:

1. Install the URL Rewrite extension on your IIS server, then create the appropriate rules (change & redirect all URLs to lowercase)
2. Add a rel=canonical attribute on all the pages, basically pointing to themselves.
3. Re-create and re-submit your sitemap
4. Wait

Shortly thereafter you should be all good :)

Here's the link to the extension:
[iis.net...]

Good luck!
W

shaunm

10:38 am on Feb 19, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks @weiskopf :-)

It goes deeper than that. Case sensitive URLs on .NET sites cause content duplication issues and many other problems


Can you please tell me how there can be content duplication issues when proper canonicalization is used? I'm a bit curious if there are other possibilities that I missed to look at.

1. Install the URL Rewrite extension on your IIS server, then create the appropriate rules (change & redirect all URLs to lowercase)


I think there is one already and I need to create the appropriate rule :-)

2. Add a rel=canonical attribute on all the pages, basically pointing to themselves.


You mean on both lowercase and uppercase pages?!? Won't that make each of them their own canonical version and make further confusion for the search engines to choose the right canonical URL?

3. Re-create and re-submit your sitemap


I'm confused. Why do I even need it? I will not need it when the uppercase URLs are not in Google's index right?


Thanks for all your help. You've given me a good start though :-)

weiskopf

11:31 am on Feb 19, 2014 (gmt 0)

10+ Year Member



Sure, I'm glad I can help :) I've been "leeching" off this forum for YEAR without being a registered member and most I know comes from this community so it's time I give back!

So:

0. Canonicalization is not bulletproof. I've got a couple of sites that had duplicate content issues, and the rel=canonical doesn't always seem to solve it. My usual approach is usually getting a double air-proof solution. rel=canonical is good but it's not good enough :)

2. NO, sorry for not being clear on that one - pointing to the *lower case* URLs only. This is if you're switching to all-lower-case URLs (which is generally more reliable than the "mixed case"/"proper case" approach).

3. Yup, you'll need a new and correct sitemap - listing all and only the "new" (i.e. fixed) URL. It's a way to signal Google - look, these are the pages I want you to crawl & index. It's good to signal these changes any way you can.

Google will still crawl your old "pages" (URLs which it sees as pages), notice the canonicalization and re-create its index of your site.


It will probably take a couple of days though.

Let me know how it goes :)

aakk9999

11:39 am on Feb 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Excellent reply, weiskopf! And welcome to WebmasterWorld :)

@shaunm
Checking HTML improvements section in Google Webmaster Tools may give you indication on what Google sees as duplicate pages as they would normally be reported under "Duplicate Titles" and "Duplicate Descriptions". Clicking on each shows all URLs google sees as showing the same content.

shaunm

12:18 pm on Feb 19, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks again @weiskopf and @aakk9999 :-)

I've got a couple of sites that had duplicate content issues
How do you know if your site has been penalized under duplicate content issues or not?

Google WMT tool only gives duplicate title, meta etc under 'HTML improvements' tab. Not sure about how and where to know if my site has been hit for duplicate issues. Am I missing something?

Thanks,

phranque

3:48 pm on Feb 19, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, weiskopf!


shaunm, part of the solution is to change the filesystem settings so that the filesystem itself is case-sensitive.

then you link internally to just the lower case url path.

any requested url that contains upper case characters get internally rewritten to a script which folds the path to lower case and then 301 redirects to the new path (specifying canonical path and hostname)


here's why you don't want to solve the infinite url space problem with the link rel canonical solution.
301 responses are cheap for google and your server - they use little server resources nor crawl budget.
if you serve up a full document for each combination of path character casing requested by googlebot you are also expecting google to digest and properly index all of them.
that document is a slow, heavy container for a link rel canonical element when a 301 response does the same job faster and with zero ambiguity.