homepage Welcome to WebmasterWorld Guest from 54.145.183.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
htaccess 301 redirect special characters
redirecting a url with encoded characters
brianr

5+ Year Member



 
Msg#: 3957440 posted 5:35 pm on Jul 22, 2009 (gmt 0)

Hi,

I'm not sure if this is an apache issue or a browser issue or some mix of the two but I'm trying to redirect a url that has encoded characters and I'm not able to get it to work.

The old resulting (encoded) url that gets displayed in the browser is like this:
http://www.example.com/products/word%2C-word-4.5-oz.-word%2C-Boxed-.html

The corresponding old original link displayed on the site looked like:
http://www.example.com/products/word,-word-4.5-oz.-word,-Boxed-.html.
I guess the browser turns the latter url into the former when it encodes it.

I'm able to redirect the old original url with:

RewriteCond %{REQUEST_URI} ^/products/word,-word-4\.5-oz\.-word,-Boxed-\.html$
RewriteRule . http://www.example.com/products/word-word-\%252d-Pre\%252dMixed-Saline-Laxative.html [R=301,L,NE]

but when I try to redirect the old encoded url in case any search engine indexes the encoded one or in case someone bookmarked the encoded url it doesn't seem to work. I am trying this:

RewriteCond %{REQUEST_URI} ^/products/word\%2C-word-4\.5-oz\.-word\%2C-Boxed-\.html$
RewriteRule . http://www.example.com/products/word-word-\%252d-Pre\%252dMixed-Saline-Laxative.html [R=301,L,NE]

Does anyone know why this last redirect isn't working? (This is in the htaccess file in the root)

 

brianr

5+ Year Member



 
Msg#: 3957440 posted 5:47 pm on Jul 22, 2009 (gmt 0)

I just realized the solution 2 seconds after I posted this and I've been working on this for hours lol:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /products/word\%2C-word-4\.5-oz\.-word\%2C-Boxed-\.html\ HTTP/
RewriteRule . http://www.example.com/products/word-word-\%252d-Pre\%252dMixed-Saline-Laxative.html [R=301,L,NE]

Use {THE_REQUEST} to get the url before it gets encoded.

That brings me to another question. In general, should you make a redirect for the encoded url as well as the un-encoded one in case someone bookmarked the encoded one? It looks like Google indexes the un-encoded one but who knows what goes on internally.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3957440 posted 3:30 pm on Jul 23, 2009 (gmt 0)

I'm not sure I understand this, since your rule is actually encoding and not-decoding the URL. I'd go the other way if this were my site... eliminating double-encoding and extra hyphens to make the URL published on your pages and used as the redirect target URL (which should be the same) as clean as possible.

Make your RewriteRule pattern as specific as possible for best performance. Also, I suggest that you cover the case of multiply-encoded strings -- i.e. when "%" itself gets encoded:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /products/word\%(25)*2C-word-4\.5-oz\.-word\%(25)*2C-Boxed-\.html\ HTTP/ [NC]
RewriteRule ^products/word http://www.example.com/products/word-word-Pre-Mixed-Saline-Laxative.html [NC,R=301,L]

Because the example URLs posted above are not entirely consistent, this may not be exactly what you want. But an encoded comma is %2c, an encoded hyphen is %2d, and an encoded "%" is %25, so if you see either %252c or %252d, then those are doubly-encoded, and you should avoid and correct that. Every time this request passes through any encoding "handler," another "25" will get added, so the point of my subpatterns above is to recognize and remove all "25" strings following "%". That is, after passing through multiple handlers, you could well end up with %2525252c, which is a multiply-encoded comma.

The goal would be to have a URL in the link on your HTML page of
products/word-word-4.5-oz-Pre-Mixed-Saline-Laxative.html and let the client (browser) encode that as needed.
Once the request arrives at your server, then it will be decoded to that same string, and that should be the correct URL for the page.

If someone does a cut and past of that URL into an HTML editor when editing a page to link to you, and it ends up multiply- encoded as a result, then you should detect that and redirect it.

Jim

brianr

5+ Year Member



 
Msg#: 3957440 posted 6:59 pm on Jul 23, 2009 (gmt 0)

That's a good point about the multiply-encoded urls. Thanks for pointing that out.

I think my use of "word" in the rules may have thrown you off since they are actually all different words and each rule above is only designed to match one url so there are no patterns for matching multiple urls, just one case for each.

This CMS takes the product name and creates the url from it so if there is a hyphen in the product name then it gets encoded as %2d in the links that appear on the site. When you click on the link and end up at the page, what gets displayed in the browser is the doubly-encoded hyphen, i.e. %252d. In this case the client has changed the name of the product from "Enema, Phosphate 4.5 oz. Economy, Boxed " (note the space at the end) to "Phosphate Enema - Pre-Mixed Saline Laxative".

I'm mentioning that because you recommended that the rewrite rule not have encoded characters but the CMS automatically puts them in and I want to be consistent with the urls. So, combining the 2 rules into one I have:


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /products/Enema(,\%(25)*2C)-Phosphate-4\.5-oz\.-Economy(,\%(25)*2C)-Boxed-\.html\ HTTP/ [NC]
RewriteRule ^products/ http://www.example.com/products/Phosphate-Enema-\%2d-Pre\%2dMixed-Saline-Laxative.html [NC,R=301,L,NE]

Any idea why the final url is now http://www.example.com/products/Phosphate-Enema---Pre-Mixed-Saline-Laxative.html? I have the %2d in the rewriterule and it's getting decoded for some reason and if I change it to %252d it stays as %252d which is actually what the user sees in their browser after they click on the link.

So, do you think I should have:


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /products/Enema(,\%(25)*2C)-Phosphate-4\.5-oz\.-Economy(,\%(25)*2C)-Boxed-\.html\ HTTP/ [NC]
RewriteRule ^products/ http://www.example.com/products/Phosphate-Enema-\%252d-Pre\%252dMixed-Saline-Laxative.html [NC,R=301,L,NE]

for the consistency factor or is there something else I'm missing here?

Thanks for your help.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3957440 posted 12:47 pm on Jul 24, 2009 (gmt 0)

You're seeing "---" because your substitution URL has "-%2d-" in it, which is "---" when decoded.

I recommend that you get rid of all encoded characters in the RewriteRule's substitution (new) URL, and correct the CMS and everything else so that only the browser does encoding as required. That is how "things are supposed to work" and doing anything else is going to cause you trouble. URLs should only be encoded at the point of transmission (the browser), and should be decoded at the point of reception (which Apache will do if they're singly-encoded). URLs should only be encoded while being sent from the browser to your server in an HTTP request. "Inside" the browser and the server, URLs should not be encoded.

JIm

brianr

5+ Year Member



 
Msg#: 3957440 posted 2:01 pm on Jul 24, 2009 (gmt 0)

I totally agree with you and I have sent this message to the company that makes this CMS and they are supposed to correct it on future versions. It is a large and expensive e-commerce CMS and trying to change something like this would probably involve changing the database as well as the code in ways that violate the warranty. They will probably have to figure out a way to distinguish three spaces from some combination of spaces and hyphens since they all end up as hyphens in the url or just do something like wordpress I guess. So, until then, I'm stuck with these types of redirects.

Thanks a lot for your help. I've come a long way with htaccess redirects and rewrites in the past month mostly because of this forum. It's actually become pretty interesting instead of a nightmare. Cheers.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3957440 posted 2:31 pm on Jul 24, 2009 (gmt 0)

> become pretty interesting instead of a nightmare.

I liken it to the NYT crossword puzzles... Either you enjoy it, or don't. :)

Best,
Jim

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3957440 posted 2:41 pm on Jul 24, 2009 (gmt 0)

*** It is a large and expensive e-commerce CMS ***

I see this so often now, that I have to wonder what the heck the 'designers' were thinking when they wrote the specifications for how the CMS would work.

brianr

5+ Year Member



 
Msg#: 3957440 posted 3:08 pm on Jul 24, 2009 (gmt 0)

I know and considering Interspire is supposed to be one of the most SEO-friendly e-commerce CMSs it really makes you wonder. Luckily the SEOs are raising a stink in the forums about these crazy urls.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3957440 posted 3:47 pm on Jul 24, 2009 (gmt 0)

Point them here: RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax [faqs.org]

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved