Forum Moderators: phranque

Message Too Old, No Replies

Using htaccess to handle "case" related errors

Can it get 'ThisFILE.htm' and serve 'thisfile.htm'?

         

lexipixel

1:15 am on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




I have a customer who has a large site with numerous <A HREF="WhatEver"> and <IMG SRC="WhatEver"> tags... the site was moved to a linux/apache server and now they are getting "not found" errors for web pages where the case doesn't match and images not appearing.

I am looking for a generic rule to apply, (Maybe with a regex?)...

One issue (at least);

- URLS which contain a '?' should not be converted to lower case, (the data for the CGI call may contain upper case)... Just looking to handle all static page / image file requests.

jdMorgan

5:32 am on Mar 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could use RewriteMap in mod_rewrite to access the operating system's tolower case-conversion function if you have access to httpd.conf -- See the Apache mod_rewrite documentation or the URL Rewriting Guide cited in our forum charter (link at above left) for an example.

If you are stuck in .htaccess context, this function won't be available. You could try using Apache mod_speling, but there is a limit to how many incorrect characters it can correct. It's also not very efficient. Or, you can use a scripted solution using PERL or PHP.

Jim

lexipixel

2:44 am on Mar 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



thanks Jim. It's on a shared server, so no luck at httpd config.

I've got some Perl code doing the better part of it.
The base of it is two regex;

#
# convert all HREF name/value pairs to
# upper case 'HREF' and lower case 'value'
#
# ie, convert-
#
# href = "/IMAGES/This_IMAGE_x123.Jpg"
#
# HREF="/images/this_image_x123.jpg"
#
# The regex account for any amout of
# white space between HREF, equal sign
# and opening quote mark. It matches
# case insensitive and globally.
#
$text =~ s¦href\s*=\s*\"([^\"]+)\"¦HREF=\"\L$1\E\"¦gi
#
# do same for SRC name/value pair in IMG tags
#
$text =~ s¦src\s*=\s*\"([^\"]+)\"¦SRC=\"\L$1\E\"¦gi
#
#

Once I got it working I realized all the exceptions needed; (to not match or modify when);

$1 as HREF, starts with mailto:

$1 as HREF, starts with [domain.tld...]
(and "domain.tld" is an external link)

$1 as SRC (same as above, used for pulling images from ad servers and affilite sites).

$1 as HREF, contains a question mark (?)
(cgi name/value pairs may be mixed case, this site uses no other methods so I am not testing for .asp, .php, etc.. they will be covered by the external link condition).

$1 as SRC (same as above, cgi affiliate programs)

...and I'm still testing to see what else it finds -- (JS embedded links with CRC's that require the exact case matching URL since the ASCII values are used as a key).

It would be nice to write rules that covered the exceptions and stuff them in one place like htaccess...

jdMorgan

2:58 am on Mar 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not familiar enough with all the factors you'll have to consider, but I would recommend *not* changing "href" to "HREF" or "src" to "SRC" -- For the sake of future-proofing, all HTML tags should be lowercase to assure XML compatibility.

Jim

lexipixel

10:26 am on Mar 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




...lowercase....XML...

Good pick up Jim, thanks.

I'm shooting for full validation on the site. Its hard teaching an old dog new tricks --- I always liked upper case NAMEs and lower case values... makes the HTML easier to read... but we do have "standards".

jdMorgan

2:48 pm on Mar 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm an old dog, too. But my validator kicks me every time I forget this new trick...

Jim

stevenmusumeche

3:50 pm on Mar 18, 2005 (gmt 0)

10+ Year Member



Check out mod_spelling.

lexipixel

7:19 am on Mar 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There is ALWAYS a way...

I don't give up.

I did a little googling for "apache htaccess error 404".

What I found that solved the program was-

write a perl script that grabs the values the ENV varibles needed, (hint: they all start with REDIRECT_)

put a line in .htaccess like-

ErrorDocument 404 /cgi-bin/err404.pl

NOTE: you can't use this to redirect to a handler on another server.. the ENV vars will not survive the trip.. to prevent it, Apache looks for [domain.tld...] in the error document handler name, ie-

ErrorDocument 404 [domain.tld...]

DOES NOT WORK

Now all 404 errors will route the user to the script. Here's the simple part-

if REDIRECT_QUERY_STRING isn't empty, it's a CGI call with data, we DO NOT want to change any query strings to lower case, only the URL.

if REDIRECT_URL is not lower case, convert it to lower case, rebuild the complete URL;

#
$ru = $ENV{'REDIRECT_URL'};
$qs = $ENV{'REDIRECT_QUERY_STRING'};
$sn = $ENV{'SERVER_NAME'};
#
if (lc($ru) ne $ru) {
$TryURL = 'http://' . lc($sn) . lc($ru);
}
if ($qs ne '') { $TryURL = $TryURL . '?' . $sn; }
#
# and present it to the user as a link
#
print "Try: <a href=\"$TryURL\">$TryURL</A>";
#

I know the perl could be tighter, but I wanted to leave it somewhat readable.