Welcome to WebmasterWorld Guest from 54.162.94.15

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Rewriting urls with special characters

     
8:38 pm on Nov 11, 2012 (gmt 0)

Junior Member

10+ Year Member

joined:May 8, 2003
posts: 124
votes: 0


Hi All,

I've been search for a couple of hours and can't seem to find an answer.

I have a few websites with malformed urls that I would like to redirect. I'm using a rewrite map but can't figure out how to match these urls.

I'm using:
RewriteMap redirects_map txt:/usr/local/www/redirects.map

Here's an example of a page I want to match against.

example_%E2%80%8Bpage.html

Of course just putting that page name in my url list in map text file doesn't work.

Is there some way I can enter an escaped version of the url in my map file or what would be the correct way to do this? I also have a few pages with %20 (space) in the url as well

Thanks,

Mark
1:58 am on Nov 12, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10859
votes: 67


the URL should be percent-decoded by the time your RewriteRule sees it and the txt MapType uses a space as a delimiter so since your key may contain a space you will need to use a different MapType
3:17 am on Nov 12, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13745
votes: 462


I also have a few pages with %20 (space) in the url as well

Lesson: Do not let your cat make your URLs. But spaces are easy: you just have to escape them, as you'd escape literal periods or parentheses. And even if escaped, a space can't be the very last thing on the line. 98,000 guesses how I know this.

Did you mean %E2%80%8B literally? I hope you're simply deleting it. Depending on context it's either a zero-width space or a misplaced BOM.
4:01 pm on Nov 12, 2012 (gmt 0)

Junior Member

10+ Year Member

joined:May 8, 2003
posts: 124
votes: 0


Just to clarify, these are external links over which I have no control.

phranque, I've been doing quite a bit of research and I do see the different kinds of MapTypes and have seen some examples, but I don't quite understand how to do this. Can I have 2 different maps? One txt and the other one to handle the urls like the example I gave?

If you could give me an example of how I could handle the example url I gave, that would be great. That is a real example, other than part of it being changed. Maybe I'm being thick headed, but I can't quite understand how to do this.

Thanks,

Mark
10:11 pm on Nov 12, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10859
votes: 67


how many of these redirects are there?
perhaps it would be simpler to handle these cases with RewriteRules and then the rest with a RewriteMap.
5:53 pm on Nov 13, 2012 (gmt 0)

Junior Member

10+ Year Member

joined:May 8, 2003
posts: 124
votes: 0


There are just a few. I could just do as you suggest. Can you give me a quick idea of how to do that. I'm not sure how to try and match against a URL like that.

Thanks,

Mark
9:30 pm on Nov 13, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10859
votes: 67


depending on how many characters must be enumerated you could either use "Escape Sequences" or "Character Classes and other Special Escapes" to match those characters:

http://perldoc.perl.org/perlre.html#Regular-Expressions

System

10:35 am on Nov 15, 2012 (gmt 0)

redhat

 
 


The following message was cut out to new thread by incredibill. New thread at: apache/4519976.htm [webmasterworld.com]
2:52 pm on Nov 17, 2012 (PST -8)
9:50 pm on Nov 15, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13745
votes: 462


There are just a few.

A few specific, individual URLs-- or a few discrete categories of error? Have the percents been unescaped by the time they hit htaccess? It's tiresome if you have to say %(?:25)? every time.

What have you got so far?

:: detour to explore nasty suspicion that literal % signs really ought to be escaped, although I've got one RewriteRule that has them unescaped and it doesn't throw errors ::
12:12 am on Nov 16, 2012 (gmt 0)

Junior Member

10+ Year Member

joined:May 8, 2003
posts: 124
votes: 0


There are just a few specific URLs
They are showing up in Webmaster Tools. This is specific example of one:
www,example.com/example_%E2%80%8Bpage.html

Mark
3:34 am on Nov 16, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13745
votes: 462


Good. Then you can toss the whole idea of the map. (I'm not sure anyone here really knows how to use them anyway. The usual advice when something gets horribly complicated is a built-to-order php script with optional database.)

When you do it without the map, what does your current RewriteRule look like?

Also and tangentially: When you say external links, do you mean real links from desirable sites that somehow got mistyped and they're not answering e-mail? Or are they URLs that exist only in google's fevered imagination? You don't want to get into a situation where you redirect one URL and then next week they make up a new one.
8:45 am on Feb 1, 2013 (gmt 0)

New User

5+ Year Member

joined:May 21, 2012
posts: 8
votes: 0


heres a solution that worked for me:

[webmasterworld.com...]
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members