Forum Moderators: phranque

Message Too Old, No Replies

Redirects and cap letter help please

Problem with cap letter in url

         

thundercat

10:37 am on Jun 11, 2010 (gmt 0)

10+ Year Member



Hello everyone and thanks for reading this,
I have noticed in google webmaster tools that my website has a lot of duplicate title and meta tags. Basically google finds two urls:
http://www.example.com/This-is-my-article
http://www.example.com/this-is-my article

The url starting with a cap letter (This-is-my-article) should be the good and unique url. The problem is that when i type in the browser this url with capped letters in it, i.e:
http://www.example.com/THIS-is-my-article
http://www.example.com/this-Is-My-Article
http://www.example.com/this-is-my-ARTICLE
etc...
everytime it will display the content of the page http://www.example.com/This-is-my-article
so it creates a lot of duplicate content.

Is there a way to use htaccess and rewrite rules to redirect all this different urls using cap letters to the original one that should start with a cap letter and everything else should be in lower cap (http://www.example.com/This-is-my-article).

Sorry if i'm not very clear, i'm not a very technical savvy guy but i try to understand how it works.

Thanks in advance

[edited by: jdMorgan at 4:22 pm (utc) on Jun 11, 2010]
[edit reason] Please use example.com only. [/edit]

g1smd

10:43 am on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The first problem to solve is to make sure your links TO those URLs are all correctly cased. For simplicity, I prefer all lower-cased URLs.

The next issue is that Apache is normally case sensitive, so you must have over-ridden that somewhere.

I would have an additional routine in the page-generation script that looks at the requested URL, compares it to the database entry holding the correct URL for this page, and if they do not match then sends the 301 header and the correct URL so that the browser can make a new request for the correct URL.

This type of redirect preserves traffic arriving via the wrong URL while signalling to search engines to update their URL database.

jdMorgan

4:35 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Case-correction with mod_rewrite in .htaccess is horribly inefficient because it can correct only one character at a time. There are several threads in our Forum Library which include code to do this, but I don't recommend it.

The likely cause of these mis-cased URLs getting indexed is linking errors. In addition, it is almost certain that mod_speling and/or MultiViews are enabled on your server, which would allow those mis-cased URLs to resolve instead of returning a 404 error.

So the solution is to correct the links, then disable mod_spelling and MultiViews using:

CheckSpelling off
Options -MultiViews


If you have only a few mis-cased URLs showing in WMT, it would likely be much faster to correct them one-by-one than by using a general case-correction routine -- again, the general mod_rewrite solution is horribly-slow.

An example would to force all-lowercase would be:

RewriteCond $1 [A-Z]
RewriteRule ^(This-is-my-article)$ http://www.example.com/this-is-my article [NC,R=301,L]

Otherwise, you have to specify what the correct mixed-case URL should be in both the RewriteCond and the RewriteRule:

RewriteCond $1 !^This-Is-My Article$
RewriteRule ^this-is-my-article$ http://www.example.com/This-Is-My Article [NC,R=301,L]

Note that in both cases, the rule pattern is case-insensitive (because of the [NC] flag) and the RewriteCond is case sensitive.

Jim

thundercat

9:12 pm on Jun 11, 2010 (gmt 0)

10+ Year Member



Thanks so much for this detailled answer Jim, I appreciate the time you spent to answer me.
I have tried the disable mod_spelling and multiviews but it didn't work. Actually it's really strange because even when i make spelling errors, the website loads.
For instance, if my original url is:
http://www.example.com/This-Is-My Article
and i write
http://www.example.com/That-Is-My Article
The page will show the content of the first one!
I will ask the CMS developper to see if there is a solution around that but as you said only bad linking can create that kind of error. The problem is someone wants to hurt my website he just have to create that kind of orphen url and i could in big troubles I think.
Anyway thanks for your help!

g1smd

9:22 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did you clear the browser cache before testing again?