Welcome to WebmasterWorld Guest from 54.159.119.255

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

How to convert only HTML page URLs from upper to lower

Htaccess URl convert upper to lower only for html pages

     
9:40 pm on Apr 16, 2013 (gmt 0)



Hi All,

I am a new bee for this forum. I have CMS site which is hosted in apache server. The site contains the URLs as upper and lower case mixed. I would like to convert all the HTML page urls in to lower case only (PDFs and Image URLs should not be converted). I have used the following rule to convert the URLs

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) ${lc:$1} [R=301,L]

which was working fine but converting all the URLs including PDF, Images and other files too. I have modified as follows

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.html) ${lc:$1} [R=301,L]

which is not working.

Kindly help me to find the solution for this issue.
10:03 am on Apr 17, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



(.*)
captures every URL request.

(.html)
captures only 4 characters "html" plus the preceding character or punctuation.

^([^.]+\.html)$
captures only .html URL requests, but correctly captures the whole request.
6:46 pm on Apr 17, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



... and, finally: If any part of the extension "html" is itself capitalized-- counting on fingers reveals that there are 2^4-1 = 15 ways to get it wrong --then the rule itself will need a [NC] flag. Conditions are only evaluated if the Rule itself can potentially apply.

And if you've got a RewriteMap working nicely, you are already way ahead of the game :)
8:15 pm on Apr 18, 2013 (gmt 0)



Thanks for your kind response

- ^([^.]+\.html)$ captures only .html URL requests

It seems not working. All urls converts as www.example.com/.html
it removes all the URL path where it should come as www.example.com/file.html . Modified code is placed below

RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
#RewriteRule (.html) ${lc:$1} [R=301,L]
RewriteRule ^([^.]+\.html)${lc:$1} [R=301,L]

Thanks in Advance
8:25 pm on Apr 18, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There's a space missing.
9:38 pm on Apr 18, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



^([^.]+\.html)$ captures only .html URL requests


In your rule as written, you probably have more than one set of parentheses. Make sure the correct capture is getting passed to the RewriteMap. Safest is to have a set of outer parentheses that contains the entire request, because those will always be $1.

The version you quoted earlier
RewriteRule (.html) ${lc:$1}

only captures the ".html" --so it is doing exactly what it has been told to do.
10:05 pm on Apr 18, 2013 (gmt 0)



> ^([^.]+\.html)$ captures only .html URL requests, but correctly
> captures the whole request.

...unless the request is something like "language.functions.html" which is a perfectly valid request and occurs in real life scenarios. Shoulda used .* ;). Correctness trumps micro-optimizations any day of the week.
10:36 pm on Apr 18, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



As before, the .* pattern is "greedy", "promiscuous" and "ambiguous" and is rarely the right thing to use.
10:39 pm on Apr 18, 2013 (gmt 0)



This is one of those times when it's the right thing to use. Otherwise your pattern won't capture all .html requests, as it claims to do.
1:47 am on Apr 19, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



brainsuccess: If your URLs really do contain literal periods-- some do, most don't-- there are better solutions than to dump the [^.]+ element and replace it with .+ (not .*, which is clearly unwarranted and can lead to capturing malformed requests).

Take, f'rinstance, the typical URL over at apache dot com, where there is often a "2.2" or "2.4" or similar in the middle. If you wanted to capture this, you'd say

^([^.]+(\.\d[^.]+)?)\.html)

Some apache installations are grumpy and require [0-9] in place of \d. Or, if you're unaccustomed to reading Regular Expressions and your brain freezes when it meets a form like \d or \s you may be better off investing the three extra bytes ;)

But the key thing in this example is: If the character following the literal . is not a numeral, then the RegEx stops its search immediately, spits out the . and skips to the end of the parentheses.
11:21 am on Apr 19, 2013 (gmt 0)



> ^([^.]+(\.\d[^.]+)?)\.html)

If the goal is to match any .html request, as is the case in this thread, then this pattern still doesn't accomplish that. We're making this more complicated than it needs to be *and* we're sacrificing correctness, just to save a few nanoseconds. That's a bad trade-off.
8:16 pm on May 2, 2013 (gmt 0)



Thanks to all responses. As I have followed all the above combinations which are mentioned in above. But none of them working on my requirement. Anybody can provide the complete code which helps to meet my requirement.

Thanks in Advance
10:32 pm on May 2, 2013 (gmt 0)



Interestingly, the code you started with is almost exactly what you needed. Just one small difference....

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*\.html?$) ${lc:$1} [R=301,L,NC]
3:00 am on May 3, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



:: sitting on hands ::
7:20 am on May 3, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, brainsuccess!


i would recommend adding the canonical protocol and hostname to the substitution string to avoid potential multiple redirect hops.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month