homepage Welcome to WebmasterWorld Guest from 54.145.182.50
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Rewriting urls with special characters
phpmaven

10+ Year Member



 
Msg#: 4518100 posted 8:38 pm on Nov 11, 2012 (gmt 0)

Hi All,

I've been search for a couple of hours and can't seem to find an answer.

I have a few websites with malformed urls that I would like to redirect. I'm using a rewrite map but can't figure out how to match these urls.

I'm using:
RewriteMap redirects_map txt:/usr/local/www/redirects.map

Here's an example of a page I want to match against.

example_%E2%80%8Bpage.html

Of course just putting that page name in my url list in map text file doesn't work.

Is there some way I can enter an escaped version of the url in my map file or what would be the correct way to do this? I also have a few pages with %20 (space) in the url as well

Thanks,

Mark

 

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4518100 posted 1:58 am on Nov 12, 2012 (gmt 0)

the URL should be percent-decoded by the time your RewriteRule sees it and the txt MapType uses a space as a delimiter so since your key may contain a space you will need to use a different MapType

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4518100 posted 3:17 am on Nov 12, 2012 (gmt 0)

I also have a few pages with %20 (space) in the url as well

Lesson: Do not let your cat make your URLs. But spaces are easy: you just have to escape them, as you'd escape literal periods or parentheses. And even if escaped, a space can't be the very last thing on the line. 98,000 guesses how I know this.

Did you mean %E2%80%8B literally? I hope you're simply deleting it. Depending on context it's either a zero-width space or a misplaced BOM.

phpmaven

10+ Year Member



 
Msg#: 4518100 posted 4:01 pm on Nov 12, 2012 (gmt 0)

Just to clarify, these are external links over which I have no control.

phranque, I've been doing quite a bit of research and I do see the different kinds of MapTypes and have seen some examples, but I don't quite understand how to do this. Can I have 2 different maps? One txt and the other one to handle the urls like the example I gave?

If you could give me an example of how I could handle the example url I gave, that would be great. That is a real example, other than part of it being changed. Maybe I'm being thick headed, but I can't quite understand how to do this.

Thanks,

Mark

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4518100 posted 10:11 pm on Nov 12, 2012 (gmt 0)

how many of these redirects are there?
perhaps it would be simpler to handle these cases with RewriteRules and then the rest with a RewriteMap.

phpmaven

10+ Year Member



 
Msg#: 4518100 posted 5:53 pm on Nov 13, 2012 (gmt 0)

There are just a few. I could just do as you suggest. Can you give me a quick idea of how to do that. I'm not sure how to try and match against a URL like that.

Thanks,

Mark

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4518100 posted 9:30 pm on Nov 13, 2012 (gmt 0)

depending on how many characters must be enumerated you could either use "Escape Sequences" or "Character Classes and other Special Escapes" to match those characters:

http://perldoc.perl.org/perlre.html#Regular-Expressions

System
redhat


 
Msg#: 4518100 posted 10:35 am on Nov 15, 2012 (gmt 0)

The following message was cut out to new thread by incredibill. New thread at: apache/4519976.htm [webmasterworld.com]
2:52 pm on Nov 17, 2012 (PST -8)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4518100 posted 9:50 pm on Nov 15, 2012 (gmt 0)

There are just a few.

A few specific, individual URLs-- or a few discrete categories of error? Have the percents been unescaped by the time they hit htaccess? It's tiresome if you have to say %(?:25)? every time.

What have you got so far?

:: detour to explore nasty suspicion that literal % signs really ought to be escaped, although I've got one RewriteRule that has them unescaped and it doesn't throw errors ::

phpmaven

10+ Year Member



 
Msg#: 4518100 posted 12:12 am on Nov 16, 2012 (gmt 0)

There are just a few specific URLs
They are showing up in Webmaster Tools. This is specific example of one:
www,example.com/example_%E2%80%8Bpage.html

Mark

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4518100 posted 3:34 am on Nov 16, 2012 (gmt 0)

Good. Then you can toss the whole idea of the map. (I'm not sure anyone here really knows how to use them anyway. The usual advice when something gets horribly complicated is a built-to-order php script with optional database.)

When you do it without the map, what does your current RewriteRule look like?

Also and tangentially: When you say external links, do you mean real links from desirable sites that somehow got mistyped and they're not answering e-mail? Or are they URLs that exist only in google's fevered imagination? You don't want to get into a situation where you redirect one URL and then next week they make up a new one.

webpilotz2



 
Msg#: 4518100 posted 8:45 am on Feb 1, 2013 (gmt 0)

heres a solution that worked for me:

[webmasterworld.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved