Forum Moderators: phranque

Message Too Old, No Replies

Removing white space from URL

Possible to use mod_rewrite to remove spaces in a URL?

         

beagle2

9:17 am on Jun 7, 2006 (gmt 0)

10+ Year Member



Hi everyone,

My software application is needing to access a web page on my server that has a space in it, e.g.

[mydomain.com...] 34

Trouble is, the server is returning an Error 400 (Bad Request) to the software. I cannot convert the URL or the software to change the space to a %20 (I won't go into details as to why, but believe me - i can't), so I need to change something server-side to stop it from returning Errror 400 for URLs with spaces in.

Would it be possible to do something with htaccess / mod_rewrite to overcome this problem, e.g. concatentating the url, or escaping the space?

Is mod_rewrite applied before Error Codes are returned?

Many thanks!

jdMorgan

1:44 pm on Jun 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since the URL is parsed for validity before any webmaster-configurable code is executed, there's nothing you can do on the server to fix this. An unencoded space is a violation of the HTTP protocol rules for constructing URLs [faqs.org], and Apache won't accept it. And even if it could accept it, there is no guarantee that other network nodes such as caching proxies, content filters, etc. would pass it through to your server correctly.

Sorry, but you'll need to fix the problem at it's source.

Jim

beagle2

2:02 pm on Jun 7, 2006 (gmt 0)

10+ Year Member



Oh, dear god no! I did not want to hear that. Are you absolutely sure that's the case?

The reason I can't change the url in my software is because my software is already released and trying to access the erroneous URL. I can't release an upgrade because it accesses the faulty URL in order to determine whether an upgrade is a available - catch 22 situation.

I'm using Tomcat so I guess I could modify the source code to accept these URLs if it came to it (thank god for open source!)...

The reason I didn't spot my mistake while testing is because I think my ISP or Proxy must automatically replace spaces with %20, so I never spotted the error till it was too late.

Can you think of anything I can do?

Many thanks

beagle2

4:24 pm on Jun 7, 2006 (gmt 0)

10+ Year Member



Panic over, it is possible to do by changing Apache setup:

[answers.google.com...]

jdMorgan

4:33 pm on Jun 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Create a "patch" file and make it available on your Web site, then e-mail customers who have the current version and let them know it's available?

The problem is that in an HTTP request, that space is taken as the end of the URL, and whatever follows is taken as the protocol specifier. What normally follows that space in the request is "HTTP/1.1" or "HTTP/1.0". So the server, seeing "34" (from your example), takes that as the protocol, and has no idea how to handle a request using a protocol called "34."

A normal request header would be:


GET /index.jsp?test1234 HTTP/1.1

The request is probably failing before the server actually even starts to handle it, so there may not be much that can be done. But on reflection, I suppose you could at least try to recover the 400 error. I have no idea if it might work, but try declaring a custom error document and redirecting it something like:


ErrorDocument 400 /chk_upd_req.tmp
#
RewriteCond %{THE_REQUEST} ^GET\ /index\.jsp\?test12\ 34\ HTTP/
RewriteRule ^/chk_upd_req\.tmp http://www.example.com/index.jsp?test1234 [R=301,L]

might work to point those requests to the proper script. If your update client can handle a 301 redirect, this might work. This is coded for use in httpd.conf. If you're using .htaccess, delete the leading slash from the RewriteRule pattern.

This is not a good long-term solution, as it leaves your server behaving oddly by accepting invalid requests. I'd hate to see what might happen if a search engine spidered your update URL. But until you can issue a patch, it might be worth trying.

Jim

beagle2

10:39 am on Jun 8, 2006 (gmt 0)

10+ Year Member



Thanks for the follow up. What I've done is asked my web hosting company to put

ProtocolReqCheck off

in the httpd.conf file, as suggested in the previous link I posted, this disables all protocol checking and just uses the URL up to the point of the first space, then defaults to HTTP/1.0 as the protocol, this would be fine in my case.

I'm still getting mixed results though; I setup a simple JSP script on my server to output the result of accessing:

[mydomain.com...] 34

With ProtocolReqCheck set to on, I was getting Server 400 error, but when they changed it to off, it worked perfectly - so I thought it was fixed. Then I go home and tried my software again, and I'm still getting an Error 400 error. Grrr!

So I think (as you pointed out earlier), my proxy at home is automatically returning Error 400 if a URL contains a space.

So not much more I can do now apart from manually sending out an upgrade patch - I wanted to avoid this, but I think it's my only option.

Thanks for your input Jim - it's much appreciated.

jdMorgan

3:59 pm on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did you flush your browser cache on the home machine? -- You could simply be seeing the cached response from earlier attempts.

Tis brings up another point -- Error pages on your server should be set up with headers marking them as non-cacheable, or specifying very short expiry times...

Jim