Forum Moderators: phranque
I am looking for a generic rule to apply, (Maybe with a regex?)...
One issue (at least);
- URLS which contain a '?' should not be converted to lower case, (the data for the CGI call may contain upper case)... Just looking to handle all static page / image file requests.
If you are stuck in .htaccess context, this function won't be available. You could try using Apache mod_speling, but there is a limit to how many incorrect characters it can correct. It's also not very efficient. Or, you can use a scripted solution using PERL or PHP.
Jim
I've got some Perl code doing the better part of it.
The base of it is two regex;
#
# convert all HREF name/value pairs to
# upper case 'HREF' and lower case 'value'
#
# ie, convert-
#
# href = "/IMAGES/This_IMAGE_x123.Jpg"
#
# HREF="/images/this_image_x123.jpg"
#
# The regex account for any amout of
# white space between HREF, equal sign
# and opening quote mark. It matches
# case insensitive and globally.
#
$text =~ s¦href\s*=\s*\"([^\"]+)\"¦HREF=\"\L$1\E\"¦gi
#
# do same for SRC name/value pair in IMG tags
#
$text =~ s¦src\s*=\s*\"([^\"]+)\"¦SRC=\"\L$1\E\"¦gi
#
#
Once I got it working I realized all the exceptions needed; (to not match or modify when);
$1 as HREF, starts with mailto:
$1 as HREF, starts with [domain.tld...]
(and "domain.tld" is an external link)
$1 as SRC (same as above, used for pulling images from ad servers and affilite sites).
$1 as HREF, contains a question mark (?)
(cgi name/value pairs may be mixed case, this site uses no other methods so I am not testing for .asp, .php, etc.. they will be covered by the external link condition).
$1 as SRC (same as above, cgi affiliate programs)
...and I'm still testing to see what else it finds -- (JS embedded links with CRC's that require the exact case matching URL since the ASCII values are used as a key).
It would be nice to write rules that covered the exceptions and stuff them in one place like htaccess...
I don't give up.
I did a little googling for "apache htaccess error 404".
What I found that solved the program was-
write a perl script that grabs the values the ENV varibles needed, (hint: they all start with REDIRECT_)
put a line in .htaccess like-
ErrorDocument 404 /cgi-bin/err404.pl
NOTE: you can't use this to redirect to a handler on another server.. the ENV vars will not survive the trip.. to prevent it, Apache looks for [domain.tld...] in the error document handler name, ie-
ErrorDocument 404 [domain.tld...]
DOES NOT WORK
Now all 404 errors will route the user to the script. Here's the simple part-
if REDIRECT_QUERY_STRING isn't empty, it's a CGI call with data, we DO NOT want to change any query strings to lower case, only the URL.
if REDIRECT_URL is not lower case, convert it to lower case, rebuild the complete URL;
#
$ru = $ENV{'REDIRECT_URL'};
$qs = $ENV{'REDIRECT_QUERY_STRING'};
$sn = $ENV{'SERVER_NAME'};
#
if (lc($ru) ne $ru) {
$TryURL = 'http://' . lc($sn) . lc($ru);
}
if ($qs ne '') { $TryURL = $TryURL . '?' . $sn; }
#
# and present it to the user as a link
#
print "Try: <a href=\"$TryURL\">$TryURL</A>";
#
I know the perl could be tighter, but I wanted to leave it somewhat readable.