Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

How to convert only HTML page URLs from upper to lower

Htaccess URl convert upper to lower only for html pages

     
9:40 pm on Apr 16, 2013 (gmt 0)

New User

joined:Apr 16, 2013
posts:3
votes: 0


Hi All,

I am a new bee for this forum. I have CMS site which is hosted in apache server. The site contains the URLs as upper and lower case mixed. I would like to convert all the HTML page urls in to lower case only (PDFs and Image URLs should not be converted). I have used the following rule to convert the URLs

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) ${lc:$1} [R=301,L]

which was working fine but converting all the URLs including PDF, Images and other files too. I have modified as follows

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.html) ${lc:$1} [R=301,L]

which is not working.

Kindly help me to find the solution for this issue.
10:03 am on Apr 17, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


(.*)
captures every URL request.

(.html)
captures only 4 characters "html" plus the preceding character or punctuation.

^([^.]+\.html)$
captures only .html URL requests, but correctly captures the whole request.
6:46 pm on Apr 17, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12711
votes: 244


... and, finally: If any part of the extension "html" is itself capitalized-- counting on fingers reveals that there are 2^4-1 = 15 ways to get it wrong --then the rule itself will need a [NC] flag. Conditions are only evaluated if the Rule itself can potentially apply.

And if you've got a RewriteMap working nicely, you are already way ahead of the game :)
8:15 pm on Apr 18, 2013 (gmt 0)

New User

joined:Apr 16, 2013
posts:3
votes: 0


Thanks for your kind response

- ^([^.]+\.html)$ captures only .html URL requests

It seems not working. All urls converts as www.example.com/.html
it removes all the URL path where it should come as www.example.com/file.html . Modified code is placed below

RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
#RewriteRule (.html) ${lc:$1} [R=301,L]
RewriteRule ^([^.]+\.html)${lc:$1} [R=301,L]

Thanks in Advance
8:25 pm on Apr 18, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


There's a space missing.
9:38 pm on Apr 18, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12711
votes: 244


^([^.]+\.html)$ captures only .html URL requests


In your rule as written, you probably have more than one set of parentheses. Make sure the correct capture is getting passed to the RewriteMap. Safest is to have a set of outer parentheses that contains the entire request, because those will always be $1.

The version you quoted earlier
RewriteRule (.html) ${lc:$1}

only captures the ".html" --so it is doing exactly what it has been told to do.
10:05 pm on Apr 18, 2013 (gmt 0)

Junior Member

joined:Apr 6, 2013
posts:149
votes: 0


> ^([^.]+\.html)$ captures only .html URL requests, but correctly
> captures the whole request.

...unless the request is something like "language.functions.html" which is a perfectly valid request and occurs in real life scenarios. Shoulda used .* ;). Correctness trumps micro-optimizations any day of the week.
10:36 pm on Apr 18, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


As before, the .* pattern is "greedy", "promiscuous" and "ambiguous" and is rarely the right thing to use.
10:39 pm on Apr 18, 2013 (gmt 0)

Junior Member

joined:Apr 6, 2013
posts:149
votes: 0


This is one of those times when it's the right thing to use. Otherwise your pattern won't capture all .html requests, as it claims to do.
1:47 am on Apr 19, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12711
votes: 244


brainsuccess: If your URLs really do contain literal periods-- some do, most don't-- there are better solutions than to dump the [^.]+ element and replace it with .+ (not .*, which is clearly unwarranted and can lead to capturing malformed requests).

Take, f'rinstance, the typical URL over at apache dot com, where there is often a "2.2" or "2.4" or similar in the middle. If you wanted to capture this, you'd say

^([^.]+(\.\d[^.]+)?)\.html)

Some apache installations are grumpy and require [0-9] in place of \d. Or, if you're unaccustomed to reading Regular Expressions and your brain freezes when it meets a form like \d or \s you may be better off investing the three extra bytes ;)

But the key thing in this example is: If the character following the literal . is not a numeral, then the RegEx stops its search immediately, spits out the . and skips to the end of the parentheses.
11:21 am on Apr 19, 2013 (gmt 0)

Junior Member

joined:Apr 6, 2013
posts:149
votes: 0


> ^([^.]+(\.\d[^.]+)?)\.html)

If the goal is to match any .html request, as is the case in this thread, then this pattern still doesn't accomplish that. We're making this more complicated than it needs to be *and* we're sacrificing correctness, just to save a few nanoseconds. That's a bad trade-off.
8:16 pm on May 2, 2013 (gmt 0)

New User

joined:Apr 16, 2013
posts:3
votes: 0


Thanks to all responses. As I have followed all the above combinations which are mentioned in above. But none of them working on my requirement. Anybody can provide the complete code which helps to meet my requirement.

Thanks in Advance
10:32 pm on May 2, 2013 (gmt 0)

Junior Member

joined:Apr 6, 2013
posts:149
votes: 0


Interestingly, the code you started with is almost exactly what you needed. Just one small difference....

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*\.html?$) ${lc:$1} [R=301,L,NC]
3:00 am on May 3, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12711
votes: 244


:: sitting on hands ::
7:20 am on May 3, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10543
votes: 8


welcome to WebmasterWorld, brainsuccess!


i would recommend adding the canonical protocol and hostname to the substitution string to avoid potential multiple redirect hops.