Forum Moderators: phranque

Message Too Old, No Replies

mod speling = duplicate content?

does implementation of mod_speling produce duplicate content?

         

suga

6:24 am on Jan 24, 2007 (gmt 0)

10+ Year Member



hi all,

i'm a newbie here. i have been a lurker for the past year or so, using this board as a resource for any webmaster questions i have! thanks for all your help.

my question is do search engine view mod_speling rewrites as duplicate content? does the module return a 404 header to the search engine or 301?

our website's directory structure uses capitalized first letters and i've noticed while looking through our error logs that spiders constantly search our site using all lowercase. i just loaded the mod_speling module and set CheckSpelling on in httpd.conf.

was this a smart thing to do?

thanks.

jdMorgan

2:57 pm on Jan 24, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check the server response headers to mis-capitalized requests to find the answer. You can use one of many on-line server headers checkers, or use the "Live HTTP Headers" extension for Firefox and Mozilla browsers. Both are easy to find with search.

If you encounter problems, look into using mod_rewrite's RewriteMap function to call "toupper" for the firs character in your directory paths and generate a proper redirect response. This is given in the mod_rewrite documentation as an example.

Take-home point: Uppercase characters in URLs are a headache with *nix-based servers, and lead to many client-induced errors.

Jim

suga

4:39 pm on Jan 24, 2007 (gmt 0)

10+ Year Member



hi jdmorgan,

thanks for your reply.

i checked the header response for a specific url which is 2 directories deep, and it is redirected permanently with a 301 twice before finding the correctly capitalized URL and returning 200 OK.

so that means no duplicate content, right?

i tried rewriting urls with a rewrite map and toupper, but it was a major, major headache because there are just too many different combinations as well as the use of underscores.

thanks for your help.

jdMorgan

8:00 pm on Jan 24, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's not a great idea to have multiple "stacked" redirects involved. If the first letter of your directory names are always uppercase, and the search engines *always* try to access them using lowercase-only, then something like this in httpd.conf would work up to four directory levels deep:

RewriteMap up int:toupper
#
# Capitalize first letter of directory, four directory levels deep
RewriteRule ^/([a-z])([^/]*)/([a-z])([^/]*)/([a-z])([^/]*)/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/${up:$3}$4/${up:$5}$6/${up:$7}$8/$9 [R=301,L]
#
# Capitalize first letter of directory, three directory levels deep
RewriteRule ^/([a-z])([^/]*)/([a-z])([^/]*)/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/${up:$3}$4/${up:$5}$6/$7 [R=301,L]
#
# Capitalize first letter of directory, two directory levels deep
RewriteRule ^/([a-z])([^/]*)/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/${up:$3}$4/$5 [R=301,L]
#
# Capitalize first letter of directory, one directory level deep
RewriteRule ^/([a-z])([^/]*)/(.*)$ http://www.example.com/${up:$1}$2/$3 [R=301,L]

Otherwise, it would be necessary to capitalize one directory-name first character at a time, check for file-exists, and then repeat as necessary, which wouldn't be too terribly hard to code, but might be too slow to be practical on a busy site.

The four-level-deep restriction is caused by mod_rewrite's limit on back-references (9), but a multi-step rewrite could still be written if your directories go more than four levels deep.

Jim

[edited by: jdMorgan at 8:01 pm (utc) on Jan. 24, 2007]

suga

11:42 pm on Jan 24, 2007 (gmt 0)

10+ Year Member



the problem is that it's not only the first letter of the directory name that is capitalized. some directories contain 1, 2 or 3 words separated by a hyphen, and those words need to be capitalized as well. i started writing rewriterules and ended up wanting to pull my hair out because of all the different combinations.

in general, we only have directories 3 level deep. so then i'd have to account for rewrites such as:

/Shoes/Running-Shoes/Nike
/Socks/Hiking/North-Face
/Wind-Breaker/Summer-Time/Water-Proof
/Brands/CA-Brand-Names/Roots-Canada

I do want to do what's best, whether it be using mod_speling or rewritemaps, but my novice-mind just couldn't seem to get the rewriterules to work correctly the majority of the time. Even with all those different combinations, is it still best to use a rewritemap and rewrite rules?

Plus, I started getting weird rewrites such as "c0c1", which is another question in itself.

Search engines don't *always* try to access my url's using lowercases, but I do see it in my error logs. The reason I am doing this is because our site recently dropped in google ranking and a consultant said that lowercase versions of my url's resulting in 404's may have had something to do with it.

Thanks again for all your help.

jdMorgan

11:57 pm on Jan 24, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



With URL-paths/file-paths like that, there's really no good cure. The only realistic option is to go to all-lowercase filenames and redirect all capitalization variants to those. The hyphens aren't a problem, but lack of a fixed capitalization standard is. The rules above could be adapted to handle the hyphens, but that "/CA-Brand" URL with two initial caps wouldn't be handled properly -- you'd end up with a redirect to "Ca-Brand".

You'll have to choose between URL-aesthetics and ranking, I'm afraid. The latest word has it that PageRank only passes reliably through a single 301 redirect. This was the impetus for the recent thread, "A guide to fixing duplicate content & URL issues on Apache [webmasterworld.com] - How to canonicalize all of your URLs with a single redirect." Unfortunately, I see no elegant solution to your problem. :(

Jim