Forum Moderators: phranque

Message Too Old, No Replies

Case conversation in URL

Without RewriteMap, what would you do?

         

pinhead

12:42 am on Dec 29, 2008 (gmt 0)

10+ Year Member



I know I'm not the first asking this question, but I've spent quite some time know looking for the answer, but wasn't able to find one. Here's the problem: I want to ensure that all of my URLs are in lowercase, and if someones trying an URL with uppercase letters in it, I want him to be redirected. Nothing special so far, now to the possible solutions:

1) Using a RewriteMap in the config-files of apache to make use of the internal function 'tolower'.
2) Using this nasty set of 27 RewriteRules to do the job in a .htaccsess-file.
3) Using a single RewriteRule to detect malformated URLs and having them redirected to a script (php or perl or something else) which does the conversion (a single statement should do the trick) and redirects to the all lowercase URL.

Of course, option 1 is by far the best, but like many people I do not have access to the config-files. So this leaves options 2 and 3. The downside of 2 is obvious: speed. If the URL only contains only a few uppercase letters, the whole process takes a lot of time. But after a closer look, option 3 doesn't look perfect to me either: It may not be a problem with multiple uppercase letters - in fact the number of malformated letters doesen't matter in this approach - but in any case, the user gets redirected to a script, which does the actual case conversion, and then gets redirected once again to the lowercase URL.

What do I do now? I've to admit I've not that much of experience with neither Apache nor web developement at all, so I'd really appreciate any input I can get.

g1smd

12:54 am on Dec 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This question has been asked, and answered (in detail - with code examples), at least a couple of times in the last few months.

I haven't got the thread numbers to hand, but the site search should find them. They mostly deal with using RewriteMap if I remember rightly.

The one time I needed this conversion, I used a simple internal rewrite to a PHP script that tested the URL, and the script redirected the user to the correct URL version using two HEADER instructions.

jdMorgan has provided a lot of detail in the earlier threads.

pinhead

1:04 am on Dec 29, 2008 (gmt 0)

10+ Year Member



Yes, I know. And believe me, I did use the site search, but I want to know whats the better: Using method 2 (27 RewriteRules) or 3 (a script)?

But I just see that my description of 3 was not ideal: Instead of an external redirect, I could just use an internal rewrite, saving me one redirection. With that in mind, I think the scriptbased solution is better, is this correct?
The only thing that confuses me, is that jdMorgan uses method 2) in his "guide to fixing duplicate content & URL issues on Apache". Maybe 2 isn't so bad after all..?

g1smd

1:22 am on Dec 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Having 26 rules in .htaccess is horribly inefficient. There's one rule for each letter, A to Z, and many might be run more than once. It's an easy-to-fit solution, which may perform OK on a low traffic site.

Using RewriteMap means you need to have access to httpd.conf to be able to edit it; or else use a host that is prepared to set that file up how you want it. Many people can't do that.

My preference (the one time I needed it) has been the external PHP script - as it was a blindingly simple rule to select out which URL requests needed to be fed to it -- they were all requests that contained just one particular parameter.

pinhead

10:38 am on Dec 29, 2008 (gmt 0)

10+ Year Member



Ok then. You're saying 'the one time I needed it'.. Does this mean, I do not have to worry about URLs (probably coming from links from other websites) with uppercase letters? Isn't that a problem, for instance with google and 'duplicated content'?

jdMorgan

4:48 pm on Dec 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note that in my code (cited above), the entire pile of rules is by-passed if there are no uppercase letters in the URL. And since uppercase-URL-errors are relatively rare, these rules won't be run very often.

Only in the case where a Webmaster mistakenly thinks that having that case-fixing code in place means that he/she does not have to correct uppercase-URL links on his/her own pages is this really a concern. Running that code once a day is one thing, but running it on each and every page request is another thing entirely!

Jim

g1smd

9:07 pm on Dec 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Non-canonical upper-case URL requests are a problem. The problem does need fixing.

What I normally have is all lower-case URLs, and if anything else is requested then it returns a 404 error. That is correct usage.

The "one time I needed it" was on a site where a mix of casing had been used in various URLs, and the site had been recently modified to only use lower-case URLs. The conversion was needed to keep all old incoming links working, and to force Google to reindex all 'incorrect' URLs.

Normally I use all lower-case URLs from Day One. If upper-case URLs are requested, they just fail with a 404 error. So, I rarely need any conversion at all.

pinhead

9:19 pm on Dec 29, 2008 (gmt 0)

10+ Year Member



Ah, that sounds reasonable. I didn't think of just returning a 404 error, but that makes perfect sense. Thanks for the help, great forum.