homepage Welcome to WebmasterWorld Guest from 54.235.16.159
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
How to convert only HTML page URLs from upper to lower
Htaccess URl convert upper to lower only for html pages
brainsuccess




msg:4565414
 9:40 pm on Apr 16, 2013 (gmt 0)

Hi All,

I am a new bee for this forum. I have CMS site which is hosted in apache server. The site contains the URLs as upper and lower case mixed. I would like to convert all the HTML page urls in to lower case only (PDFs and Image URLs should not be converted). I have used the following rule to convert the URLs

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*) ${lc:$1} [R=301,L]

which was working fine but converting all the URLs including PDF, Images and other files too. I have modified as follows

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.html) ${lc:$1} [R=301,L]

which is not working.

Kindly help me to find the solution for this issue.

 

g1smd




msg:4565569
 10:03 am on Apr 17, 2013 (gmt 0)

(.*) captures every URL request.

(.html) captures only 4 characters "html" plus the preceding character or punctuation.

^([^.]+\.html)$ captures only .html URL requests, but correctly captures the whole request.
lucy24




msg:4565722
 6:46 pm on Apr 17, 2013 (gmt 0)

... and, finally: If any part of the extension "html" is itself capitalized-- counting on fingers reveals that there are 2^4-1 = 15 ways to get it wrong --then the rule itself will need a [NC] flag. Conditions are only evaluated if the Rule itself can potentially apply.

And if you've got a RewriteMap working nicely, you are already way ahead of the game :)

brainsuccess




msg:4566101
 8:15 pm on Apr 18, 2013 (gmt 0)

Thanks for your kind response

- ^([^.]+\.html)$ captures only .html URL requests

It seems not working. All urls converts as www.example.com/.html
it removes all the URL path where it should come as www.example.com/file.html . Modified code is placed below

RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
#RewriteRule (.html) ${lc:$1} [R=301,L]
RewriteRule ^([^.]+\.html)${lc:$1} [R=301,L]

Thanks in Advance

g1smd




msg:4566108
 8:25 pm on Apr 18, 2013 (gmt 0)

There's a space missing.

lucy24




msg:4566135
 9:38 pm on Apr 18, 2013 (gmt 0)

^([^.]+\.html)$ captures only .html URL requests


In your rule as written, you probably have more than one set of parentheses. Make sure the correct capture is getting passed to the RewriteMap. Safest is to have a set of outer parentheses that contains the entire request, because those will always be $1.

The version you quoted earlier
RewriteRule (.html) ${lc:$1}

only captures the ".html" --so it is doing exactly what it has been told to do.

Dideved




msg:4566139
 10:05 pm on Apr 18, 2013 (gmt 0)

> ^([^.]+\.html)$ captures only .html URL requests, but correctly
> captures the whole request.

...unless the request is something like "language.functions.html" which is a perfectly valid request and occurs in real life scenarios. Shoulda used .* ;). Correctness trumps micro-optimizations any day of the week.

g1smd




msg:4566142
 10:36 pm on Apr 18, 2013 (gmt 0)

As before, the .* pattern is "greedy", "promiscuous" and "ambiguous" and is rarely the right thing to use.

Dideved




msg:4566143
 10:39 pm on Apr 18, 2013 (gmt 0)

This is one of those times when it's the right thing to use. Otherwise your pattern won't capture all .html requests, as it claims to do.

lucy24




msg:4566186
 1:47 am on Apr 19, 2013 (gmt 0)

brainsuccess: If your URLs really do contain literal periods-- some do, most don't-- there are better solutions than to dump the [^.]+ element and replace it with .+ (not .*, which is clearly unwarranted and can lead to capturing malformed requests).

Take, f'rinstance, the typical URL over at apache dot com, where there is often a "2.2" or "2.4" or similar in the middle. If you wanted to capture this, you'd say

^([^.]+(\.\d[^.]+)?)\.html)

Some apache installations are grumpy and require [0-9] in place of \d. Or, if you're unaccustomed to reading Regular Expressions and your brain freezes when it meets a form like \d or \s you may be better off investing the three extra bytes ;)

But the key thing in this example is: If the character following the literal . is not a numeral, then the RegEx stops its search immediately, spits out the . and skips to the end of the parentheses.

Dideved




msg:4566304
 11:21 am on Apr 19, 2013 (gmt 0)

> ^([^.]+(\.\d[^.]+)?)\.html)

If the goal is to match any .html request, as is the case in this thread, then this pattern still doesn't accomplish that. We're making this more complicated than it needs to be *and* we're sacrificing correctness, just to save a few nanoseconds. That's a bad trade-off.

brainsuccess




msg:4570128
 8:16 pm on May 2, 2013 (gmt 0)

Thanks to all responses. As I have followed all the above combinations which are mentioned in above. But none of them working on my requirement. Anybody can provide the complete code which helps to meet my requirement.

Thanks in Advance

Dideved




msg:4570161
 10:32 pm on May 2, 2013 (gmt 0)

Interestingly, the code you started with is almost exactly what you needed. Just one small difference....

RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteRule (.*\.html?$) ${lc:$1} [R=301,L,NC]

lucy24




msg:4570197
 3:00 am on May 3, 2013 (gmt 0)

:: sitting on hands ::

phranque




msg:4570239
 7:20 am on May 3, 2013 (gmt 0)

welcome to WebmasterWorld, brainsuccess!


i would recommend adding the canonical protocol and hostname to the substitution string to avoid potential multiple redirect hops.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved