Forum Moderators: phranque
What I want to happen is for apache to convert %3F to? and %3D to =
Somehow urls are coming to the site with these % characters and thus resolving in http 404. My ultimate goal is to convert them to the respective question mark and equal sign. If that cannot be done, is there a way to still read for the %3d and %3f characters in the url and then maybe I can just redirect to the home page?
[edited by: jdMorgan at 4:58 pm (utc) on Aug. 20, 2007]
[edit reason] example.com [/edit]
#This would redirect to the home page
RewriteRule MyPage.jsp%$ http://www.example.com [R=301,L]
#This would replace the character
RewriteRule ^(.*)%3F(.*)$ $1?$2 [N,L]
[edited by: jdMorgan at 5:42 pm (utc) on Aug. 20, 2007]
[edit reason] example.com [/edit]
You normally have to use a RewriteCond testing %{QUERY_STRING} to manipulate normal (unencoded) query strings. But that won't work here, because there's no "?" to tell Apache how to parse the URL-path and query string anyway. So, the solution is to go back to the source -- The original HTTP request header as received from the client.
Try something like this:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /MyPage\.jsp\%3[fF]seq\%3[dD]([^\ ]+)\ HTTP/
RewriteRule ^MyPage\.jsp$ http://www.example.com/MyPage.jsp?seq=%1 [R=301,L]
Jim
Here is my VH entry. Can you see anything that I did wrong why it would only work for example.com and not www.example.com?
<VirtualHost *>
ServerName www.example.com
DocumentRoot "/home/myapp"
ServerAlias example.com
DirectoryIndex index.jsp
#Turn rewrite engine on
RewriteEngine On
#mydomain.com goes to www.example.com
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com$1 [R=301,L]
#Redirect http://www.example.com to http://www.example.com/myapp/index.jsp
RewriteCond %{REQUEST_URI} ^/$
RewriteRule ^(.*) http://www.example.com/myapp/index.jsp [R=301,L]
#Convert % values to the correct? and = value
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteRule ^(.*)%3F(.*)$ $1?$2 [N,L]
RewriteRule ^(.*)%3D(.*)$ $1=$2 [N,L]
</VirtualHost>
[edited by: jdMorgan at 6:30 pm (utc) on Aug. 20, 2007]
[edit reason] example.com [/edit]
Then follow your external redirects with your internal rewrites.
I also strongly suggest using an external redirect to 'correct' the %3D urls, and if you do that, then that rule should be first. If you don't use an external redirect, then search engines will pick up and index the incorrect URLs, and you'll be dealing with this problem for a long long time.
I don't know if you even tested the code I posted, but I strongly recommend that you use that method for best portability across server versions...
Jim
Even moving my redirects around and either using my code or yours, still www.mydomain.com doesnt work. It still only works for mydomain.com. I've cleared my cache. Rebooted the server, etc. Some weird reason apache doesnt like something about that code.
Try putting a test redirect in place, like:
RewriteRule ^/foo\.html$ [google.com...] [R=301,L]
and test that with both domains.
This is the order of rules that I'd recommend:
<VirtualHost *>
ServerName www.example.com
DocumentRoot "/home/myapp"
ServerAlias example.com
DirectoryIndex index.jsp
#Turn rewrite engine on
RewriteEngine On
# Redirect to remove hex-encoded query string characters
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /MyPage\.jsp\%3[fF]seq\%3[dD]([^\ ]+)\ HTTP/
RewriteRule ^/MyPage\.jsp$ http://www.example.com/MyPage.jsp?seq=%1 [R=301,L]
#Redirect http://www.example.com/ to http://www.example.com/myapp/index.jsp
RewriteRule ^/$ http://www.example.com/myapp/index.jsp [R=301,L]
#mydomain.com goes to www.example.com
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com$1 [R=301,L]
</VirtualHost>
[/code]
Note several corrections and optimizations. One of them was to my code (added leading slash on rule pattern), because I forgot this was for httpd.conf and not for .htaccess.
Now about the [N] flag... Do you have case with more than one %3d in the query string?
If so, you may want to use [N], which basically loops to the top of the .htaccess code. But it's slow and inefficient, and also can call a specific Apache mod_rewrite bug into play.
If you do you have more than one %3d in the URL, we can discuss that in detail. It will require that you use [N] while internally rewriting all of the %-encoded characters, and then do an external redirect after all have been fixed. To do that, you'll also need to use the [E=envar] flag to "remember" that you've corrected at least one %-encoded character so as to do an external redirect after the characters have all been fixed (and not before, and only if needed).
Jim
There is only one %3D and %3f in the querystring.
If I put the following test code in, it works perfectly for both domains. Just FYI, my root is /myapp, so I had to add that to your test code.
RewriteRule ^/myapp/foo\.html$ [google.com...] [R=301,L]
Now..I guess one more question. My MyPage is located one directory in from the root. So its located at domain.com/myapp/news/MyPage.jsp. I'm not sure if that matters? I have tried your new VH block and still, just one domain forwards. I have changed the Redirect to the following and still just one domain forwards:
#Redirect to remove hex-encoded query string characters
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /myapp/news/MyPage\.jsp\%3[fF]seq\%3[dD]([^\ ]+)\ HTTP/
RewriteRule ^/myapp/news/MyPage\.jsp$ http://www.example.com/myapp/news/MyPage.jsp?seq=%1 [R=301,L]
And, I just want to thank you very much for all the fast replies. I really do sincerely appreciate your help. I will keep banging on this, but if you have any additional ideas, I will greatly appreciate again.
[edited by: jdMorgan at 1:05 am (utc) on Aug. 21, 2007]
[edit reason] example.com [/edit]
What, precisely, do you mean by "gateway?"
You're looking for something that can get control and interfere with your rewrite. If this "gateway" is anything other than a router, switch, or completely-transparent proxy, then it's a candidate for investigation.
To view this problem from a completely-different angle, and perhaps to thereby glean additional information, try testing your redirects using the "Live HTTP Headers" extension for Firefox/Mozilla browsers. Carefully watch for any kind of unexpected redirects or changes in the malformed query string that take place before you expect them to, for example, at this "gateway" or in other server config files.
Also, please define, in detail, what you mean by "still just one domain forwards." Not to be pedantic, but we cannot see over your shoulder here...
For both cases -- "works" and "does not work":
What complete url did you request?
What was the result?
How does that result differ from your expectations?
I suppose you could also define and enable RewriteLog if this continues to be problematic...
And one further supposition... If there is some 'agent' in the way that is interfering, a prime candidate for the kind of interference that would break the rule is that perhaps the interferer is double-encoding the hex-encoded characters. In which case, modifying the rule like this would fix it:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /myapp/news/MyPage\.jsp\%(25)*3[fF]seq\%(25)*3[dD]([^\ ]+)\ HTTP/
RewriteRule ^/myapp/news/MyPage\.jsp$ http://www.example.com/myapp/news/MyPage.jsp?[b]seq=%3 [R=301[/b],L]
I guess this explains why the last programmer who tried to put percent-encoded characters in my URLs left with a black eye... :)
Jim
What, precisely, do you mean by "gateway?". I mean physical router. Basically our stuff is in a datacenter. They host the router. 0 rules on it. All they do is move the traffic to our network.
Also, please define, in detail, what you mean by "still just one domain forwards." Not to be pedantic, but we cannot see over your shoulder here...
If I go to http://example.com/myapp/news/MyPage.jsp%3Fseq it correctly adds the www and converts the %3F. If I go to http://www.example.com/myapp/news/MyPage.jsp%3Fseq, I get a 404..hence, its not converting the %3F. What is so damn confusing, is that if I do NOT add the www, it automatically adds the www and converts the % characters. If I add the www myself, it gets error 404.
For both cases -- "works" and "does not work":
[edited by: jdMorgan at 12:24 pm (utc) on Aug. 21, 2007]
[edit reason] example.com [/edit]
If I go to [domain.com...] i get 404.
I'm not sure what to do at this point. It makes no sense.
I'm better off spending more time on other things that are going to bring in $$. If I had the staff, then yes, I'd strive for 100%. But right now this company is better off by me building additional models for revenue.