Forum Moderators: phranque

Message Too Old, No Replies

Making URLs case insensitive without correcting misspellings

         

TRob

3:09 pm on Feb 29, 2012 (gmt 0)

10+ Year Member



I need our URLs to be case-insensitive. For example, a promotional URL might be mysite.com/acme, redirectmatched to /promo/acme.htm. Some people enter mysite.com/Acme or MYSITE.COM/ACME, and all must redirect to the proper file.

I set checkspelling on and checkcaseonly on in the root .htaccess, which takes care of the case problem. However, it results in 300 Multiple Choices error messages. Example:

mysite.com/land is a promo URL, /landing is a directory with private pages excluded by robots.txt. mysite.com/land results in the 300 Multiple Choices page with suggested URLs of /landing/index.htm and /landing/index.lck (Dreamweaver file). I don't want people knowing about or having access to /landing. I'm sure this issue will occur with other pages and directories.

How do I keep case correction and turn off all other types of spelling correction? Would it help if mod_speling were turned on in the server config, rather than turning it on in /.htaccess?

I'm a new member and just got thrown into the deep end of the Apache pool--I look forward to participating in the forum. I've learned a lot about Apache and .htaccess over the last few weeks, mainly through lots of reading plus trial-and-error. Thanks!

lucy24

10:00 pm on Feb 29, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You don't need mod_speling at all. What you need is the [NC] flag attached to each RewriteRule. It means "match this pattern, regardless of case".

Changing the case of an input without redirecting or rewriting is nasty and messy and is best done by routing to a php script. Same goes when you need to capture and reuse text. But changing the case as part of a rewrite or redirect when you're spelling out the entire target is trivial.

input >> output [NC]

will also cover

INPUT >> output
InPuT >> output
Input >> output

et cetera, so long as you don't try to capture and reuse the (input) side.

Caution! This is fine for a redirect, but be careful with rewrites. Not for any technical reason but because it exposes you to Duplicate Content. In the case of the single word "input", that's 2^5 = 32 possible versions of the same page if your search engine is case sensitive.

g1smd

10:32 pm on Feb 29, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With Rewrites, you cannot use the [NC] flag when you have a captured backference and it points to a filename:

RewriteRule ^(foobar) /folder/$1.php [NC,L]
- and with filename called foobar.php - will work for
example.com/foobar
request but will fail for
example.com/FOOBAR
request as there is no matching file with that case.

With a fixed filename in the target the rule can work:

RewriteRule ^foobar /folder/foobar.php [NC,L]
- will allow the request to have any case, but you'll have infinite Duplicate Content.

To avoid that, make sure your script detects the originally requested URL and if it is incorrectly cased sends a 301 redirect to the URL correctly cased.

TRob

12:32 am on Mar 1, 2012 (gmt 0)

10+ Year Member



Thank you for the replies! I'm not working with PHP script, just plain HTML and htaccess. Your RewriteRule example looks great for promotional URLs I create. But there are many mixed-case scenarios for which I cannot create a specific rule, such as someone typing with CAPS LOCK ON, finGer fumbLing, or an old URL on another site that has mixed case.

I also need to preserve query strings. e.g. mysite.com/Acme?utm_campaign=abc --> mysite.com/folder/acme.htm?utm_campaign=abc.

To make matters worse, PDF files on the site have mixed case, such as Terrific-White-Paper.pdf, to look more presentable when downloaded. A URL with different case must still work, e.g. mysite.com/terrific-white-paper.pdf

When I started this project, I had no idea how involved the server setup would be.

g1smd

1:05 am on Mar 1, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A more complex RewriteRule can be used.

It should not fix the case of the requested URL for certain extensions or it should fix the case of the requested URL for a certain list of extensions. Pick one method.

You should rewrite (that's rewrite not redirect) requests that contain one or more upper case characters in the path. Rewrite those requests to a special new PHP script that slices up the URL, performs strtolower() on the path part and then uses the PHP HEADER directive to send a 301 status and the new lower case location.

It's likely that the RewriteRule for this will also have one or two preceding RewriteConds, not least to select or to deselect certain extensions.

lucy24

1:15 am on Mar 1, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Query strings are self-preserving ;) That is, they don't change unless you explicitly ask them to change. So that's one less thing to worry about.

g1smd

1:24 am on Mar 1, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



mod_rewrite adds it back on unless you explicity state a new query string or specify that it be removed entirely.

If you use a PHP script to send the 301 redirect, it will need to copy in the query string and add it back on the end of the target URL string.