Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Changing CMS and URLs. Good idea to lower-case all URLs while at it?

         

1script

4:22 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all,

I'm changing the CMS on one of my sites and it necessitates URL changes. Since some 10 years ago the URLs on this site have always been created using both lowercase and uppercase characters. There was not really a system to it, just whatever the author typed into the Title field became the URL. Sometimes authors would capitalize every word, sometimes use all caps for abbreviations, that kind of thing.

I was pretty careful to make sure that lowercase versions of mixed-case URLs are not indexed (all URLs are 301-ed to their actual canonical version, whatever the letter case happens to be)

Now, since I have to bite the bullet and 301 tens of thousands of URLs to their new version regardless of the case, does this respectful board think it's a good idea to use the opportunity and also lowercase all URLs?

I'm not terribly sure why would I lowercase them though - mostly because all modern CMSes do it this way AFAIK. But is there a better reason to lowercase URLs? Does Google "like" lowercase URLs any better?

deadsea

11:05 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google doesn't like lowercase urls any better.

I had a site with mixed case urls. Here are the reasons that it sucks:
1) Some crawlers lower case all urls, so you will get requests for the lower case versions.
2) If you are serving on IIS, incorrect case requests will get served and you have canonicalization problems. On a unix server, the requests will get 404s.
3) If all urls are lower case, it is very easy to write a rewrite rule to redirect all mixed case urls to the lower case version. If you have mixed case urls, issuing redirects for mis-cased urls is a royal pain in the rear.

lucy24

11:17 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If all urls are lower case, it is very easy to write a rewrite rule to redirect all mixed case urls to the lower case version.

For a given definition of "very easy", unless you know something the rest of us don't.

deadsea

11:48 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is a thread with the code:
[webmasterworld.com...]

I usually do it in my custom 404 handler written in perl. There it is very easy.


if ($uri =~ /[A-Z]/){
$uri=lc($uri);
redirect('http://$site$uri');
}

pageoneresults

11:53 am on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On Windows using .htaccess

# Force everything to be lower-case.
RewriteCond %{REQUEST_METHOD} (GET|HEAD) [NC]
RewriteCond %{REQUEST_URI} ([^?]+\u[^?]*)(?:\?.*)?
RewriteRule (.*) $1 [R=301,CL]

1script

3:19 pm on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you for your responses, guys! I'm getting an idea that the ease of handling is probably the best reason to lowercase all those URLs.

In my case the wrong case URLs were not "lost", so to speak, because the CMS would have automatically picked up on the wrong case and issue a 301 to the canonical version. However, this "URL cleaning" does require making SQL requests that would not be necessary if all the URLs were of one case. So, if nothing else, I may be saving a bit of database load if I lowercase all URLs via .htaccess before the request even hits the CMS.

So, the lowercasing thing becomes more an more attractive the more I think about it :)

g1smd

9:27 pm on Oct 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In Apache, very early in the .htaccess file (after the RewriteRule(s) which block bad requests, and before any other canonicalisation redirects) add this internal rewrite:

RewriteRule [A-Z] /special-script.php [L]


The special PHP file looks at the requested URL path, and changes the path and filename to lower case using strtolower() or similar, before using a HEADER to send the 301 redirect and the new location (including protocol and domain name).

The actual code has been posted several times in the last few months, and several times in previous years.