Forum Moderators: phranque

Message Too Old, No Replies

Redirecting bad or old casing

apache rewrite

         

dstiles

11:37 am on Mar 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm having problems creating a test for old URI casing and redirecting it to the proper, newer URI. Example:
/Tension.php
should be
/tension.php

Obviously I could do one-URI at a time but this is for several.

I've tried variations on...
RewriteRule (.+)\.php /$1.php [NC,R=301,L]
but that results in the incorrectly-cased original.

Help, please?

w3dk

12:32 pm on Mar 7, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



So, you always want to convert uppercase to lowercase? And the trailing ".php" is always correctly cased?

Where is this directive to be used? In what context? In a server (or VirtualHost) context? Or in a directory or .htaccess context?

The directive you posted would have presumably created a redirect loop, as it simply redirects to itself? (But in a server context you'd end up multiplying the slashes at the start of the URL-path.)

dstiles

1:41 pm on Mar 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, although actual cases in this instance resolve to the initial letter of the URI. I supposed that something like ^[A-Z].+\.php$ might do it but instead I got from it (and other variations) a redirect to /.php - the $1 did not signify.

I'm applying it in htaccess, where I have a similar one to redirect all .asp pages to .php.

In some of my tests I did indeed end up with a douple / at the start of the pagename.

w3dk

3:15 pm on Mar 7, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



Actually, if these requests map to physical files then you can use mod_speling (if enabled) to "correct" the case - regardless of whether it's upper or lowercase.

For example, this checks the first letter of the basename, rather than the URL per se and issues a redirect if the file is cased differently.


<FilesMatch "^[A-Z].+\.php$">
CheckSpelling on
CheckCaseOnly on
</FilesMatch>


Or, you can use an Apache expression (Apache 2.4) with the RewriteCond directive in .htaccess and call the tolower() function.

TBH, it's easier to just lowercase the whole URL-path, if there is an uppercase letter anywhere in the URL-path. For example:


RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] %1 [R=301,L]


If you specifically only want to target just the first letter of the URL-path (or basename?) then it can be derived from the above.

lucy24

5:08 pm on Mar 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I kinda think mod_speling is going out of fashion, in part because it creates Duplicate Content: it works with a silent rewrite, not a visible redirect.

Do not say [NC]; that only applies to matching the request against the pattern. It doesn’t change the capture, and in fact will create false positives and an infinite loop. Instead say
[A-Z]\w+\.php
or even
[A-Z]\w+\.(php|asp)
if you might get requests that have both issues, casing and extension.

This is your own server, isn't it? You can then use one of the standard RewriteMaps to level the casing. RewriteMap gets a page all to itself:
[httpd.apache.org...]
The one you want is “tolower”, one of the four built-ins:
RewriteMap lc int:tolower
But, as usual, don't use their redirect example verbatim. Among other things: Are all the problem URLs at the root, or do you need to deall with directories as well?

w3dk

7:04 pm on Mar 7, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



mod_speling .... works with a silent rewrite, not a visible redirect.


Hhhmm, it should be the other way round... specifically, a 301 external redirect, not an internal rewrite? Or does "CheckCaseOnly on" change this behaviour?!

lucy24

10:40 pm on Mar 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Horse's mouth says that CheckCaseOnly “Limits the action of the speling module to case corrections”, meaning that it doesn’t do the more complicated (and server-intensive) task of looking for misspellings. Unfortunately, the default setting is Off.

It is a very long time since my host had mod_speling enabled--in fact I was surprised to see it still exists in 2.4--so I can’t speak from personal experience. And the docs say only
[if] only one document is found that "almost" matches the request, then it is returned in the form of a redirection response.
And that's not very helpful, because #1 it doesn't specify 301 vs. 302, and more seriously #2 Apache often says “redirect” when referring to internal activity, or what we try to call--with double markedness--an internal rewrite. I guess you can't really blame them, considering things like headers in REDIRECT_ which obviously refer to internal activity on the present request, or REDIRECT_STATUS in php. But it's confusing.

A more serious issue is that mod_speling seems awfully server-intensive, especially if you're going the whole hog look-for-misspellings route. (And why would you do something that can only be helpful to stupid robots?)

In any case, the tolower RewriteMap seems tailor-made for the situation in the present thread.

phranque

11:16 pm on Mar 7, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you essentially have 3 choices:
  1. if you have access to the server config file(s) (virtual host context) you can use mod_rewrite's RewriteMap directive with the internal tolower function:
    https://httpd.apache.org/docs/current/rewrite/rewritemap.html#int
    https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritemap
    this thread may be helpful:
    Partial URL Rewrite Upper to Lower Case in Apache [webmasterworld.com]
  2. use mod_rewrite (RewriteRule directives) to internally rewrite requests for URLs with uppercase letters to a script that converts the letters to lowercase and then invokes a 301 external redirect.
  3. use a series of RewriteRule directives to internally rewrite the URL for each uppercase letter and then externally redirect to the resulting converted URL.
    see this thread for ideas:
    A guide to fixing duplicate content & URL issues on Apache [webmasterworld.com]
    this fact could simplify the ultimate solution:
    actual cases in this instance resolve to the initial letter of the URI

[edited by: phranque at 11:43 pm (utc) on Mar 7, 2021]

phranque

11:40 pm on Mar 7, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] %1 [R=301,L]

i hadn't considered using Expressions in Apache HTTP Server [httpd.apache.org].

if i were to use this method, i would recommend specifying the full canonical schema and hostname in the substitution string:
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule [A-Z] https://www.example.com%1 [R=301,L]


actual cases in this instance resolve to the initial letter of the URI

and if this is the case i would use:
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule ^[A-Z] https://www.example.com%1 [R=301,L]

dstiles

11:23 am on Mar 8, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy - your suggestion [A-Z]\w+\.php does not change the case. I've probably not got the whole thing right. Also, I forgot to say there is sometimes a hyphen connecting two words in the pagename and sometimes the initial of that is capitalized. My test below returned only the second word including the hyphen. If I omitted the second clause I got only the original uncorrected pagename.
RewriteRule ^[A-Z]\w+(-[A-Z]?\w+)?\.php /$1\.php [R=301,L]

I'm wary of tolower. It would probably do the job but it opens up for irrational casing from scrapers and bots. Also, would it not act upon EVERY pagename regardless of necessity? - Ok, I've found the answer to that in one of phranque's links. It may be the solution but it acts upon the whole URI, not just the initials.

Alright, one solution gleaned from phranque's link works (but not just on initials) BUT with the incorrectly cased URI being given in the browser bar. Yes, it loads the page but I suspect, despite the 301, it would perpetuate the incorrect casing? Oh, just looked at the logs and it shows the correct lower-cased URI, so that must be a browser problem (Falkon).
RewriteCond expr "tolower(%{REQUEST_URI}) =~ /(.*)/"
RewriteRule ^[A-Z] https://www.example.com%1 [R=301,L]

Also: since this is a test site using a spare domain I would like it to auto-select the correct domain name on going live rather than hard-code it, so I changed www.example.com to %{HTTP_HOST}. That part seems to work as well.

Many thanks to all for the help!

lucy24

4:40 pm on Mar 8, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



your suggestion [A-Z]\w+\.php does not change the case
It's not supposed to. That's just the pattern for the capture: anything that contains an upper-case letter will be subject to this rule.

But you wouldn't ever have URLs in the form
abc-Abc
would you? Then the extra capital letter doesn't matter, because you're already capturing the whole thing and making it lowercase.

That's why I double-checked that this is your own server. You can use a RewriteMap in htaccess, but you can only declare it in config.

Option B, which I think was one of phranque's suggestions, is to rewrite to a php script which performs the transformation and then issues a 301 redirect. Here, too, you can include requests for .asp and then it all gets leveled to php in the redirect. I think it's about three lines of php.

Matter of fact... do ALL page requests ultimately get handled by php? If so, you might not need to say anything in config at all. When the request arrives at the php page, do an either/or: IF the request contains an upper-case letter, transform and redirect. ELSE, proceed directly to whatever the php would normally do.

dstiles

10:02 am on Mar 9, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the update, Lucy. I understand what you're saying. Yes, it's my own server.

All page requests do get handled by php; for failed ones it's in errdoc.php. But phranque's solution, noted above, seems to work fine. I've decided to ignore non-original incorrect casing instances as being, from my logs, infrequent to non-existent.

w3dk

10:57 am on Mar 9, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



if i were to use this method, i would recommend specifying the full canonical schema and hostname in the substitution string


Why would this method be different to any other external redirect?

phranque

11:59 am on Mar 9, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



if i were to use this or any other method, i would recommend specifying the full canonical schema and hostname in the substitution string, unless there was an articulated reason otherwise.

why would you omit those?

lucy24

5:15 pm on Mar 9, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why would this method be different to any other external redirect?
If the redirect target doesn't specify exact hostname and protocol, the server will use whatever was in the request--which might well be wrong too. That would lead to a redirect chain, unnecessary work for the server and possibly delay for the end user if they've got a slow connection.