lucy24 - 8:22 am on Jul 26, 2013 (gmt 0)
I think I would be happier if....
Dog Bites Man ;)
OK, it's a choice between
(a) look very closely at Rule Eleven
(b) look very closely at smooth-reader's report on 459-page Notes volume of Joseph Hall's Selections from Early Middle English, and/or same person's report on Volume I of Gairdner's 1904 edition of the Paston Letters
(c) continue hammering away at Zupitza's edition of Aelfric
Rule Eleven it is.
#11 Redirect URLs containing valid characters to remove trailing punctuation
RewriteRule ^(.*)[^/0-9a-z]+$ http://www.example.com/$1 [NC,R=301,L]
The rule is easiest to fine-tune if you're looking at an existing site with known URL patterns. What characters, other than alphanumerics, lowlines and hyphens, will actually occur in the body of an URL? (You may not even use _ lowlines, but they generally count as \w so they are no extra trouble.)
A lot of nasties like commas and periods are technically legal-- but if you don't use them, the whole thing becomes vastly easier. Here assuming that you don't have one of those hypothetical servers that kick up a fuss at the \w locution:
RegEx merrily captures along until it meets something other than alphanumeric, lowline, hyphen or directory slash. If the something-else is a literal period, it can then pick up the period and any subsequent alphanumerics. (This is assuming the request doesn't contain two unrelated forms of garbage, such as a bogus extension or back-to-back directory slashes. Just how fumble-fingered are your human visitors?) Otherwise it's done. It is also done if the very first requested character is something unacceptable.
The special case of a request beginning with a literal period need not be considered, because the config file already has a rule blocking requests for any filename with leading period.
Everything up to this point is captured. If there is anything left over, it is ignored and a redirect is issued for the captured part.
Come to think of it, this rule will also redirect requests with extraneous path info after the extension. This is probably desirable.