Forum Moderators: phranque

Message Too Old, No Replies

Mod Rewrite problem with encoded slashes

         

ts_gg

4:34 pm on Feb 20, 2010 (gmt 0)

10+ Year Member



Hi everybody!
Got a little problem with encoded shlahs in my URLs.
Following Rule in my .htaccess makes problems:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?for=$1 [L]


It works almost for everything i need, but not for following url:

'http://[mywebsite]/Le+Noeoc+%2F+Luceily'

it throws me an error 404 - not exists

'http://[mywebsite]/Le+Noeoc+/+Luceily'

works like

'http://[mywebsite]/index.php?for=Le+Noeoc+/+Luceily'

even

'http://[mywebsite]/index.php?for=Le+Noeoc+%2F+Luceily'

works fine, too.
does anyone know, what the problem is ?

Thanks a lot!

jdMorgan

5:54 pm on Feb 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We really need to see the entry from your server error log file, which will show the filepath that the server is attempting to access, and from your access log, which will show the URL-path that the client is requesting. Both are needed to correctly diagnose this problem.

The solution also depends on whether it is the server itself throwing the 404, or whether the request is actually being rewritten to your script and your script itself is throwing the 404. If it is the script, the easiest solution is to run the incoming query string --the "GET" parameters-- through the built-in PHP url-decoding function, which will change the singly-encoded %2f and multiply-encoded slashes such as %252f or %2525252f back to slashes.

By the way, you can significantly improve your server's performance --possibly enough to notice on every page load-- by excluding certain paths and filetypes from being checked for "file- or directory-exists" in your last rule. The three examples below shows several options.

The first example is less-specific, and excludes any request ending with a slash or a filetype, as defined by a period followed by only letters or number at the end of the requested URL-path. This may not be specific enough to work on your site, but it is most efficient.

The second example is less-specific, and excludes any request ending with a filetype, as defined by a period followed by only letters or number at the end of the requested URL-path. Again, this still may not be specific enough to work on your site.

The third example shows an explicit exclusion list. You should list only the filetypes that you know *never* need to be handled by your script.
Example 1:

# Do not check file- or directory-exists for
# requests for index.php itself
RewriteCond $1 !^index\.php$
# requests ending with any filetype
RewriteCond $1 !\.[a-z0-9]+$
# Otherwise, if requested URL does not resolve to existing file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# if the URL-path does not end with a slash, rewrite the request
# to the index.php script with URL-path in query string
RewriteRule ^(.*[^/])$ index.php?for=$1 [L]

Example 2:

# Do not check file- or directory-exists for
# requests for index.php itself
RewriteCond $1 !^index\.php$
# requests ending with any filetype
RewriteCond $1 !\.[a-z0-9]+$
# Otherwise, if requested URL does not resolve to existing file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# rewrite the request to the index.php script with URL-path in query string
RewriteRule ^(.*)$ index.php?for=$1 [L]

Example 3:

# Do not check file- or directory-exists for
# previously-rewritten requests for index.php itself
RewriteCond $1 !^index\.php$
# requests for media and document files, xml (e.g. sitemap.xml) files or txt (e.g. robots.txt) files
RewriteCond $1 !\.(gif|jpe?g|png|css|js|ico|pdf|flv|wmv|mpe?g|mp3|txt)$
# Otherwise, if requested URL does not resolve to existing file or directory
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# rewrite the request to the index.php script with URL-path in query string
RewriteRule ^(.*)$ index.php?for=$1 [L]

Using any of these examples to add exclusions will significantly reduce the number of unnecessary disk reads, which are *very* slow compared to code execution. Even if you are not able to include the second RewriteCond to exclude any filetypes, simply using the first RewriteCond will improve your original rule's performance by 50%.

If you do use the per-filetype exclusion list (third example), put the filetypes in order from most-frequently-accessed to least. Check your 'stats' to determine which filetypes are accessed most often.

If your site is very busy, you will likely notice an immediate improvement in your server response (page-load) time.

Jim

ts_gg

8:29 pm on Feb 20, 2010 (gmt 0)

10+ Year Member



Hi Jim!
Thanks a lot for your quick reply.
I tried all your 3 examples.

Example 1 and 2 throw server 404 error:

The requested URL /Le+Noeoc+/+Luceily was not found on this server.

this shows the access log:
[20/Feb/2010:13:39:20 -0600] "GET /Le+Noeoc+%2F+Luceily HTTP/1.1" 404 1696 "-"

error log shows nothing.

Example 3 and mine throw script 404 errors, which i can't handle, because the value of the GET parameter is replaced with '404.shtml', if the file 404.shtml does not exists on the server. So, i can see the page i like, but the shown value is wrong. if the file 404.shtml exists on the server, i become redirected to it. i don't get it.