|Are upper- and lower-case URLs the same page?|
or are we in for a duplicate content penalty?
Does Google think these URLs point to the same page?
I've searched for this answer but haven't found anything definitive.
Matt Cutts wrote, in answer to a question on how URLs can be canonicalized, "Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs". (Source: [mattcutts.com...] )
But that answer seemed to be describing how Google internally canonicalizes URLs. My question is whether, in a pending CMS upgrade, we should force/standardize currently mixed-case URLs to lowercase, given that the mixed-case versions have been indexed and ranked for years.
In other words, if example,com/FOO is currently well-indexed and ranked, should we risk renaming the page to lower-case 'foo'?
|Does Google think these URLs point to the same page? |
Unless somethings changed (and these days it seems to quite a bit) they are in fact, different URL's.
If both FOO & foo pages contain the same content, then it will be duplicate content. If FOO has different content than foo, then it will not be duplicate content. They are different pages. What you put on them is up to you.
Edited to add:
"In other words, if example,com/FOO is currently well-indexed and ranked, should we risk renaming the page to lower-case 'foo'?
If it ain't broke, don't fix it, if you decide to fix it, redirect it.
[edited by: WW_Watcher at 8:06 pm (utc) on Nov. 6, 2007]
The mess is created by Windows servers. In their default configuration they are not case sensitive, but most of the other operating systems are, including those used by Google and other search engines.
There is some spidering evidence of Google trying to discover which sites are on non-case-sensitive servers, but that's a crazy job and I would not depend on Google or any other Search Engine getting it accurately sorted.
Help them out - if you can make all urls lower case, that is the best practice. If you can configure your server to be case sensitive, that's another best practice. If you have a URL that is already well ranked and it uses some uppercase, then know that changing those letters to lowercase does create alternative urls.
It is a rare thing to acquire a duplicate "penalty", but when the same content appears on technically different urls, then that kind of duplication has negative effects. Backlink influence gets split up, one or more of the url versions gets filtered out of search results and so on. This is not a true penalty, as in a black mark against your domain. However, the ranking and traffic problems that are generated can feel like one.
And yet another way for a competitor to screw you up in the SERPs. This is just as bad as the http vs https of the same content and google in it's infiniate wisedom should... should be able to filter it out.
> should be able to filter it out.
They could, by requesting every possible uppercase/lowercase variant of all of your pages, and then comparing the content of each of them. That is indeed possible, but your bandwidth costs would rise as the square of the length of your URLs, and they'd be able to update their index at least once every five years...
We as Webmasters have to accept some responsibility for the technical correctness of our domains, and cannot rely on Google to fix everything. Besides, who says they'll be the lead search provider forever?
Google did not set the rules. Those rules were set by the RFCs that defined HTTP and the other conventions used on the Web, long before Google's time.
Make sure that each piece of content has only one URL that it can be directly accessed with.
Make sure that alternative URLs serve a 301 redirect to the correct URL, or serve a 404 error.
Thanks to everyone for confirming my suspicions.
I just realized another way to confirm this -- Google shows different TBPR values on our site for pages that don't enforce canonical capitalization, e.g.:
If the TBPR value differ, then clearly Google thinks these pages are distinct.
Duplicate Content also occurs on-site from:
- http vs. https
- multiple domains and/or multiple TLDs
- www vs. non-www (the most common problem)
- named index pages vs. "/"
- "with-/" vs. "no-/" (server should redirect to "with-/" version)
- mutiple paths to the same content (e.g. virtual topic/directory structure on blogs)
- multiple parameters but with the parameters in a different order
- extra parameters (e.g. for "Print Friendly" pages)
- Capitalisation Issues (IIS only)
and so on. There are very many discussions of these points in the forum archive stretching back four, or more, years.
From personal experience we have been trying to move over mixed case URL's to lower case ones for a while..
Although from a ranking perspective both can do equally well (on our site at least!) - the thinking is that from a user perspective, having easy URL's all of one type is much more memorable.. (we also have some hyphens and some underscores..)
With the mixed case URL's which have backlinks to them we've been more reluctant. With the ones where there is no external influence we've just changed the URL and put a 301 on the old URL and no problem. If there are backlinks I'd think about trying to get them redirected and then putting a 301 on the old page...
But as was mentioned - if it aint broke.... Are you really sure you want to be tampering with it?!