Forum Moderators: phranque
i'm a newbie here. i have been a lurker for the past year or so, using this board as a resource for any webmaster questions i have! thanks for all your help.
my question is do search engine view mod_speling rewrites as duplicate content? does the module return a 404 header to the search engine or 301?
our website's directory structure uses capitalized first letters and i've noticed while looking through our error logs that spiders constantly search our site using all lowercase. i just loaded the mod_speling module and set CheckSpelling on in httpd.conf.
was this a smart thing to do?
thanks.
If you encounter problems, look into using mod_rewrite's RewriteMap function to call "toupper" for the firs character in your directory paths and generate a proper redirect response. This is given in the mod_rewrite documentation as an example.
Take-home point: Uppercase characters in URLs are a headache with *nix-based servers, and lead to many client-induced errors.
Jim
thanks for your reply.
i checked the header response for a specific url which is 2 directories deep, and it is redirected permanently with a 301 twice before finding the correctly capitalized URL and returning 200 OK.
so that means no duplicate content, right?
i tried rewriting urls with a rewrite map and toupper, but it was a major, major headache because there are just too many different combinations as well as the use of underscores.
thanks for your help.
RewriteMap up int:toupper
#
# Capitalize first letter of directory, four directory levels deep
RewriteRule ^/([a-z])([^/]*)/([a-z])([^/]*)/([a-z])([^/]*)/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/${up:$3}$4/${up:$5}$6/${up:$7}$8/$9 [R=301,L]
#
# Capitalize first letter of directory, three directory levels deep
RewriteRule ^/([a-z])([^/]*)/([a-z])([^/]*)/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/${up:$3}$4/${up:$5}$6/$7 [R=301,L]
#
# Capitalize first letter of directory, two directory levels deep
RewriteRule ^/([a-z])([^/]*)/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/${up:$3}$4/$5 [R=301,L]
#
# Capitalize first letter of directory, one directory level deep
RewriteRule ^/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/$3 [R=301,L]
The four-level-deep restriction is caused by mod_rewrite's limit on back-references (9), but a multi-step rewrite could still be written if your directories go more than four levels deep.
Jim
[edited by: jdMorgan at 8:01 pm (utc) on Jan. 24, 2007]
in general, we only have directories 3 level deep. so then i'd have to account for rewrites such as:
/Shoes/Running-Shoes/Nike
/Socks/Hiking/North-Face
/Wind-Breaker/Summer-Time/Water-Proof
/Brands/CA-Brand-Names/Roots-Canada
I do want to do what's best, whether it be using mod_speling or rewritemaps, but my novice-mind just couldn't seem to get the rewriterules to work correctly the majority of the time. Even with all those different combinations, is it still best to use a rewritemap and rewrite rules?
Plus, I started getting weird rewrites such as "c0c1", which is another question in itself.
Search engines don't *always* try to access my url's using lowercases, but I do see it in my error logs. The reason I am doing this is because our site recently dropped in google ranking and a consultant said that lowercase versions of my url's resulting in 404's may have had something to do with it.
Thanks again for all your help.
You'll have to choose between URL-aesthetics and ranking, I'm afraid. The latest word has it that PageRank only passes reliably through a single 301 redirect. This was the impetus for the recent thread, "A guide to fixing duplicate content & URL issues on Apache [webmasterworld.com] - How to canonicalize all of your URLs with a single redirect." Unfortunately, I see no elegant solution to your problem. :(
Jim