homepage Welcome to WebmasterWorld Guest from 54.163.139.36
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
How to convert only HTML page URLs from upper to lower
Htaccess URl convert upper to lower only for html pages
brainsuccess



 
Msg#: 4565412 posted 9:40 pm on Apr 16, 2013 (gmt 0)

Hi All,

I am a new bee for this forum. I have CMS site which is hosted in apache server. The site contains the URLs as upper and lower case mixed. I would like to convert all the HTML page urls in to lower case only (PDFs and Image URLs should not be converted). I have used the following rule to convert the URLs

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) ${lc:$1} [R=301,L]

which was working fine but converting all the URLs including PDF, Images and other files too. I have modified as follows

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.html) ${lc:$1} [R=301,L]

which is not working.

Kindly help me to find the solution for this issue.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4565412 posted 10:03 am on Apr 17, 2013 (gmt 0)

(.*) captures every URL request.

(.html) captures only 4 characters "html" plus the preceding character or punctuation.

^([^.]+\.html)$ captures only .html URL requests, but correctly captures the whole request.
lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4565412 posted 6:46 pm on Apr 17, 2013 (gmt 0)

... and, finally: If any part of the extension "html" is itself capitalized-- counting on fingers reveals that there are 2^4-1 = 15 ways to get it wrong --then the rule itself will need a [NC] flag. Conditions are only evaluated if the Rule itself can potentially apply.

And if you've got a RewriteMap working nicely, you are already way ahead of the game :)

brainsuccess



 
Msg#: 4565412 posted 8:15 pm on Apr 18, 2013 (gmt 0)

Thanks for your kind response

- ^([^.]+\.html)$ captures only .html URL requests

It seems not working. All urls converts as www.example.com/.html
it removes all the URL path where it should come as www.example.com/file.html . Modified code is placed below

RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
#RewriteRule (.html) ${lc:$1} [R=301,L]
RewriteRule ^([^.]+\.html)${lc:$1} [R=301,L]

Thanks in Advance

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4565412 posted 8:25 pm on Apr 18, 2013 (gmt 0)

There's a space missing.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4565412 posted 9:38 pm on Apr 18, 2013 (gmt 0)

^([^.]+\.html)$ captures only .html URL requests


In your rule as written, you probably have more than one set of parentheses. Make sure the correct capture is getting passed to the RewriteMap. Safest is to have a set of outer parentheses that contains the entire request, because those will always be $1.

The version you quoted earlier
RewriteRule (.html) ${lc:$1}

only captures the ".html" --so it is doing exactly what it has been told to do.

Dideved



 
Msg#: 4565412 posted 10:05 pm on Apr 18, 2013 (gmt 0)

> ^([^.]+\.html)$ captures only .html URL requests, but correctly
> captures the whole request.

...unless the request is something like "language.functions.html" which is a perfectly valid request and occurs in real life scenarios. Shoulda used .* ;). Correctness trumps micro-optimizations any day of the week.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4565412 posted 10:36 pm on Apr 18, 2013 (gmt 0)

As before, the .* pattern is "greedy", "promiscuous" and "ambiguous" and is rarely the right thing to use.

Dideved



 
Msg#: 4565412 posted 10:39 pm on Apr 18, 2013 (gmt 0)

This is one of those times when it's the right thing to use. Otherwise your pattern won't capture all .html requests, as it claims to do.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4565412 posted 1:47 am on Apr 19, 2013 (gmt 0)

brainsuccess: If your URLs really do contain literal periods-- some do, most don't-- there are better solutions than to dump the [^.]+ element and replace it with .+ (not .*, which is clearly unwarranted and can lead to capturing malformed requests).

Take, f'rinstance, the typical URL over at apache dot com, where there is often a "2.2" or "2.4" or similar in the middle. If you wanted to capture this, you'd say

^([^.]+(\.\d[^.]+)?)\.html)

Some apache installations are grumpy and require [0-9] in place of \d. Or, if you're unaccustomed to reading Regular Expressions and your brain freezes when it meets a form like \d or \s you may be better off investing the three extra bytes ;)

But the key thing in this example is: If the character following the literal . is not a numeral, then the RegEx stops its search immediately, spits out the . and skips to the end of the parentheses.

Dideved



 
Msg#: 4565412 posted 11:21 am on Apr 19, 2013 (gmt 0)

> ^([^.]+(\.\d[^.]+)?)\.html)

If the goal is to match any .html request, as is the case in this thread, then this pattern still doesn't accomplish that. We're making this more complicated than it needs to be *and* we're sacrificing correctness, just to save a few nanoseconds. That's a bad trade-off.

brainsuccess



 
Msg#: 4565412 posted 8:16 pm on May 2, 2013 (gmt 0)

Thanks to all responses. As I have followed all the above combinations which are mentioned in above. But none of them working on my requirement. Anybody can provide the complete code which helps to meet my requirement.

Thanks in Advance

Dideved



 
Msg#: 4565412 posted 10:32 pm on May 2, 2013 (gmt 0)

Interestingly, the code you started with is almost exactly what you needed. Just one small difference....

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*\.html?$) ${lc:$1} [R=301,L,NC]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4565412 posted 3:00 am on May 3, 2013 (gmt 0)

:: sitting on hands ::

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4565412 posted 7:20 am on May 3, 2013 (gmt 0)

welcome to WebmasterWorld, brainsuccess!


i would recommend adding the canonical protocol and hostname to the substitution string to avoid potential multiple redirect hops.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved