Forum Moderators: phranque

Message Too Old, No Replies

CheckSpelling questions

Involves a directory full of mixed case files

         

Wizcrafts

9:34 pm on Jan 21, 2006 (gmt 0)

10+ Year Member



I have a directory with several thousand product pages which were created for me by an affiliate manager. Unfortunately for me the majority of these filenames are mixed-case and I uploaded them as such to my Apache-based rented hosting space.

These pages have been indexed correctly, with mixed cases, and see lots of hits, but I have recently seen a few 404 errors, where for some reason the searcher used all lowercase letters. They couldn't have clicked on a link from a SERP, since those are all mixed-case. I am looking for a solution to these infrequent bad-case searches until I figure out if it is worth my trouble to convert thousand filenames to lowercase and try to get them re-indexed as such.

I have read several threads here about this type of problem and since I do not have access to the server config file, just .htaccess, I thought might try mod_speling using "CheckSpelling on" to try matching the bad filename requests. I have a few questions about this command, as I have never tried using it before.

Does it matter where in the flow of .htaccess I put that command, performance-wise? Will it work if I place it only in the directory where these misspelled files are found, instead of in my web-root (this would create less overhead as the only misspelled files are in that directory)? My main .htaccess is already quite large, so placing the spellcheck directive in a sub-directory would not place any load on my other main files, which are not mixed case.

Last question; Does anybody know of a tool that I can use to change all filenames in a directory to all lowercase, both locally and on my server?

Thanks in advance.

Wiz

Wizcrafts

10:52 pm on Jan 21, 2006 (gmt 0)

10+ Year Member



Update: I tried adding CheckSpelling On to the particular directory's .htaccess and got a server 500 error. Are other directives also required to enable the spell checking to work? I am waiting to hear from the web host if the mod_speling is even installed on the server. If it is not, what other options are available to locate a mistyped URL, case-insensitive?

Wiz

extras

11:13 pm on Jan 21, 2006 (gmt 0)

10+ Year Member



An example of case-insensitive file serving, using mod_rewrite + CGI.

This method doesn't require renaming.
But it uses more server power than renaming all files to lowercase name and converting the request.


#!/bin/sh
# icat.cgi
#
# Case insensitive file serving cgi.
# This script find files by creating shell glob pattern.
# It's relatively light, but adds some overhead.
#
# by extras
#
# 1. Place this script in docroot and chmod 700 (or 755).
# 2. Add following RewriteRule in the .htaccess
# 3. Test.
#
# RewriteEngine on
# RewriteBase /
# RewriteCond %{DOCEMENT_ROOT}/%{REQUEST_URI}! -f
# RewriteRule ^/*(.*)$ icat.cgi?$1 [L]
#

alpha='Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz'
s=$1
ext=${s##*.}
echo -n "Content-type: "
case "$ext" in
# Add more mime types if you need ....
[gG][iI][fF]) echo "image/gif" ;;
[jJ][pP][gG]) echo "image/jpg" ;;
[pP][nN][gG]) echo "image/png" ;;
[tT][xX][tT]) echo "text/plain" ;;
*) echo "text/html" ;;
esac
echo ""

#echo "===$s==="
while : ;do
case "$s" in
"") break ;;
?) c="$s" ;s='';;
*) sn="${s#?}"; c="${s%$sn}"; s="$sn" ;;
esac
case "$c" in
"" ) break ;;
[a-z]) cc="${alpha%$c*}";cc="${cc##* }"; r="${r}[$c$cc]" ;;
[A-Z]) cc="${alpha#*$c}";cc="${cc%% *}"; r="${r}[$cc$c]" ;;
*) r="${r}$c" ;;
esac
done
#echo "$r"
case "$r" in
"") echo "Error: No filename";;
*) for F in $r; do cat $F;break;done ;;
esac

#END

===============================

Example of serving with lowercase name.


#!/bin/sh
#
# lower.cgi
#
# Put this in the .htaccess;
#
# RewriteEngine on
# RewriteBase /
# #This rule detects upper char in the URL and redirects it to lower.cgi
# RewriteRule [A-Z] lower.cgi [L]
#
# Put lower.cgi in docroot and set executable permission.

# check for the https, just in case.
case "$HTTPS" in
'') PROTO='http' ;;
*) PROTO='https' ;;
esac

# change it to the lower case
URI=`echo $REDIRECT_URL úUtr A-Z a-z`

# Check if there is a QUERY_STRING
case "$REDIRECT_QUERY_STRING" in
'') QS='' ;;
*) QS="?$REDIRECT_QUERY_STRING" ;;
esac

# Send location header
echo "location: ${PROTO}://$HTTP_HOST$URI$QS"
echo

# Optionally, send the text virtually nobody sees.
#echo "302 Moved ${PROTO}://$HTTP_HOST$URI$QS"

# end

===============================

To convert filenames:
(On the home machine, it needs cygwin or something that allows you to run shellscript)
It's in pure shellscript. You can write similar thing in any script language, too.


#!/bin/sh
# tolower.cgi
#
# Rename all files and directory names to lower case.
#
# WARNING:
# This script IS dangerous. Test it with "echo" before to apply.
#
# Instruction
# 1. Place this script in protected directory chmod 700 (or 755).
#
# 2. Access the script by http://yoursite.com/protected/lower.cgi
# 3. In the address bar, add the target directory name with '?'.
# http://yoursite.com/protected/tolower.cgi?/target
#
# This case, the target will be /path-to-docroot/target
#

alpha='Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz'
target=$1
echo ""

case "$target" in
"") echo Ready; exit;;
esac
#echo "===$s==="
for fn in $DOCUMENT_ROOT/$target/*[A-Z]* ; do
s="$fn"
r=""
while : ; do
case "$s" in
"") break ;;
?) c="$s" ;s='';;
*) sn="${s#?}"; c="${s%$sn}"; s="$sn" ;;
esac
case "$c" in
"" ) break ;;
[A-Z]) cc="${alpha#*$c}";cc="${cc%% *}"; r="${r}$cc" ;;
*) r="${r}$c" ;;
esac
done

# This line let you test (dry run) to see what going to happen
# When you are sure, comment out it.
echo "cp -v $fn $r"

# This line actually do the Copying. (Original files are left intact.)
# After testing, removed the '#' to activate the action.
# cp -v "$fn" "$r"

# If you want to change the name without keeping original file,
# use this line instead of "cp" command line
# mv -v "$fn" "$r"

done

#END

===============================

Also, there is a pure RewriteRule solution.
No script needed. Maybe it's the best choice for some people.

This will work once all filenames are changed to lowercase,
and as long as the request contain less than 10 appearance of the same uppercase char.
Not tested, but it should work, as the logic is so simple.

Personally, I wouldn't use.
But compared to badly written long long lines of SEO RewriteRule we see time to time, it doesn't look so bad.


RewriteEngine on
RewriteBase /
RewriteRule![A-Z] - [L]
RewriteRule ^/*(.*)A(.*)$ $1a$2
RewriteRule ^/*(.*)B(.*)$ $1b$2
RewriteRule ^/*(.*)C(.*)$ $1c$2
RewriteRule ^/*(.*)D(.*)$ $1d$2
RewriteRule ^/*(.*)E(.*)$ $1e$2
RewriteRule ^/*(.*)F(.*)$ $1f$2
RewriteRule ^/*(.*)G(.*)$ $1g$2
RewriteRule ^/*(.*)H(.*)$ $1h$2
RewriteRule ^/*(.*)I(.*)$ $1i$2
RewriteRule ^/*(.*)J(.*)$ $1j$2
RewriteRule ^/*(.*)K(.*)$ $1k$2
RewriteRule ^/*(.*)L(.*)$ $1l$2
RewriteRule ^/*(.*)M(.*)$ $1m$2
RewriteRule ^/*(.*)N(.*)$ $1n$2
RewriteRule ^/*(.*)O(.*)$ $1o$2
RewriteRule ^/*(.*)P(.*)$ $1p$2
RewriteRule ^/*(.*)Q(.*)$ $1q$2
RewriteRule ^/*(.*)R(.*)$ $1r$2
RewriteRule ^/*(.*)S(.*)$ $1s$2
RewriteRule ^/*(.*)T(.*)$ $1t$2
RewriteRule ^/*(.*)U(.*)$ $1u$2
RewriteRule ^/*(.*)V(.*)$ $1v$2
RewriteRule ^/*(.*)W(.*)$ $1w$2
RewriteRule ^/*(.*)X(.*)$ $1x$2
RewriteRule ^/*(.*)Y(.*)$ $1y$2
RewriteRule ^/*(.*)Z(.*)$ $1z$2

===============================

If there are filenames with more than 10 of the same upper case chars,
you can simply copy & paste the conversion rule to double the capacity.


RewriteEngine on
RewriteBase /
RewriteRule![A-Z] - [L]
RewriteRule ^/*(.*)A(.*)$ $1a$2
RewriteRule ^/*(.*)B(.*)$ $1b$2
RewriteRule ^/*(.*)C(.*)$ $1c$2
RewriteRule ^/*(.*)D(.*)$ $1d$2
RewriteRule ^/*(.*)E(.*)$ $1e$2
RewriteRule ^/*(.*)F(.*)$ $1f$2
RewriteRule ^/*(.*)G(.*)$ $1g$2
RewriteRule ^/*(.*)H(.*)$ $1h$2
RewriteRule ^/*(.*)I(.*)$ $1i$2
RewriteRule ^/*(.*)J(.*)$ $1j$2
RewriteRule ^/*(.*)K(.*)$ $1k$2
RewriteRule ^/*(.*)L(.*)$ $1l$2
RewriteRule ^/*(.*)M(.*)$ $1m$2
RewriteRule ^/*(.*)N(.*)$ $1n$2
RewriteRule ^/*(.*)O(.*)$ $1o$2
RewriteRule ^/*(.*)P(.*)$ $1p$2
RewriteRule ^/*(.*)Q(.*)$ $1q$2
RewriteRule ^/*(.*)R(.*)$ $1r$2
RewriteRule ^/*(.*)S(.*)$ $1s$2
RewriteRule ^/*(.*)T(.*)$ $1t$2
RewriteRule ^/*(.*)U(.*)$ $1u$2
RewriteRule ^/*(.*)V(.*)$ $1v$2
RewriteRule ^/*(.*)W(.*)$ $1w$2
RewriteRule ^/*(.*)X(.*)$ $1x$2
RewriteRule ^/*(.*)Y(.*)$ $1y$2
RewriteRule ^/*(.*)Z(.*)$ $1z$2
RewriteRule ^/*(.*)A(.*)$ $1a$2
RewriteRule ^/*(.*)B(.*)$ $1b$2
RewriteRule ^/*(.*)C(.*)$ $1c$2
RewriteRule ^/*(.*)D(.*)$ $1d$2
RewriteRule ^/*(.*)E(.*)$ $1e$2
RewriteRule ^/*(.*)F(.*)$ $1f$2
RewriteRule ^/*(.*)G(.*)$ $1g$2
RewriteRule ^/*(.*)H(.*)$ $1h$2
RewriteRule ^/*(.*)I(.*)$ $1i$2
RewriteRule ^/*(.*)J(.*)$ $1j$2
RewriteRule ^/*(.*)K(.*)$ $1k$2
RewriteRule ^/*(.*)L(.*)$ $1l$2
RewriteRule ^/*(.*)M(.*)$ $1m$2
RewriteRule ^/*(.*)N(.*)$ $1n$2
RewriteRule ^/*(.*)O(.*)$ $1o$2
RewriteRule ^/*(.*)P(.*)$ $1p$2
RewriteRule ^/*(.*)Q(.*)$ $1q$2
RewriteRule ^/*(.*)R(.*)$ $1r$2
RewriteRule ^/*(.*)S(.*)$ $1s$2
RewriteRule ^/*(.*)T(.*)$ $1t$2
RewriteRule ^/*(.*)U(.*)$ $1u$2
RewriteRule ^/*(.*)V(.*)$ $1v$2
RewriteRule ^/*(.*)W(.*)$ $1w$2
RewriteRule ^/*(.*)X(.*)$ $1x$2
RewriteRule ^/*(.*)Y(.*)$ $1y$2
RewriteRule ^/*(.*)Z(.*)$ $1z$2

Wizcrafts

12:06 am on Jan 22, 2006 (gmt 0)

10+ Year Member



Extras;
Thanks for all the extras! I appreciate it and will try out the various cgi scripts to see how well they work for my situation.

Hopefully I can get my web host to install mod_speling which will solve the problem with less overhead, but any solution is better than none.

Wiz

jdMorgan

3:25 am on Jan 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's an old thread with a more-efficient implementation of the rewrite coding and RewriteRule patterns [url=]http://www.webmasterworld.com/forum92/958.htm]here[/url]. Actually, there are several similar threads, but that's the first one that a site search turned up.

A better approach, if you have httpd.conf access, is to invoke the system 'tolower' function using RewriteMap. This is shown in the mod_rewrite documentation as one of the examples.

Jim