Forum Moderators: phranque
I continue to struggle forward in my understanding of htaccess URL rewrites. I have a site converting dynamic URLs into "pretty" URLS and for the most part it functions correctly. My problem seems to be somewhere with the addition of the trailing slash. It is required for some URLs but there are instances when it is being added and I do not want it.
I have www.example.com/category redirected to www.example.com/category/
This is perfect. And when I have a query string like www.example.com/category/?query=string I am still in good shape.
Other URLs use the .html extension and work out fine, like www.example.com/page.html (the trailing slash is not added and thats perfect).
My issue comes when a query string gets added here, like www.example.com/page.html?query=string. It adds a slash after.html and then I get a 404 result: www.example.com/page.html/?query=string.
People here have always been so helpful and insightful I am hoping someone can assist me in solving this dilema. Here is a snippet from the htaccess:
--------------------------
################################################
#
#Change /category
#to /category/
#
################################################
#
# Externally redirect to add trailing slash if one is not present
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{THE_REQUEST} !^([^/]*/)*([A-Za-z0-9-]+)\.([A-Za-z0-9]+)\ HTTP/
RewriteCond %{THE_REQUEST} !^(([^/]*/)*)\.html\ HTTP/
RewriteRule ^(([^/]+/)*[^/]+)$ /$1/ [R=301,QSA,L]
#
#
################################################
#
#Change /cat1/catN/
#to /?path=cat1/catN/&dir=catN
#
################################################
#
# Internally rewrite to script, copying directory name to query string
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^((([^/]+)/)*)?$ ?path=$1&dir=$3 [QSA,L]
#
#
################################################
#
#Change /?def=category
#to /category
#
################################################
#
# Externally redirect to script, copying directory name to query string
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(index\.php)?\?def=([^&]+)\ HTTP/
RewriteRule ^(index\.php)?$ [example.com...] [R=301,QSA,L]
#
#
################################################
#
#Change /category/page-slug.html
#to /?pid=page-slug
#
################################################
#
# Internally rewrite to script, copying file name to query string
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(([^/]*/)*)([A-Za-z0-9-]+)\.html?$ ?path=$1&pid=$3 [QSA,L]
#
#
------------------------------
Thanks again for sharing the knowledge!
# Externally redirect to add trailing slash if one is not present
[code]
RewriteCond $1 !^([^/]*/)*[a-z0-9\-]+\.[a-z0-9]+$ [NC]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(([^/]*/)*[^/]+)$ http://www.example.com/$1/ [R=301,L]
It is unnecessary to use parentheses unless you need to back-reference the matched substring or you wish to apply a quantifier to the sub-pattern within parentheses, or both. Using them when not needed simply makes the server do extra work.
Using [NC] makes the string comparison case-insensitive, making a separate test for "[A-Z]" unnecessary, and speeding things up by 33%.
Putting the URL-test RewriteCond first in the list of RewriteConds reduces the number of required filesystem checks needed, again speeding things up (often by factors of several hundred to thousands of times faster). When you have a choice, always put RewriteConds which do filesystem checks and reverse-DNS lookups last because they are hugely-inefficient compared to all other mod_rewrite functions. And when using these functions, always make the RewriteRule pattern and other RewriteConds as specific as possible, so that these functions will only be invoked if absolutely necessary. It would be faster to 'manually' check several dozen URL-paths using RewriteConds to exclude them from being rewritten/redirected than it would be to make a single filesystem check.
Always specify the canonical hostname in external redirect rules, to prevent problems if UseCanonicalName is 'on' but the server is configured with the 'wrong' canonical domain name -- e.g. example.com vs. www.example.com.
Escape the hyphen character in [groups] as shown, just to be safe on all Apache versions. (Fix your other rules as well.)
There is no need to use [QSA] unless you want to append additional query string parameters to the original, and keep the original parameters as well. If no "?" appears in the RewriteRule substitution path, then adding [QSA] is a waste of time. If there is a "?" in the RewriteRule substitution path, but no query string follows it, then both the "?" and the [QSA] are unnecessary and wasteful.
Generally, it is only required to use a RewriteCond to check %{THE_REQUEST} when dealing with two rules, one of which rewrites URL "A" to filepath "B", and the other of which redirects filepath "B" when requested as a URL, back to URL "A". In this case the check of %{THE_REQUEST} is used in the redirect rule to prevent an 'infinite' rewrite-redirect loop. But in most other cases, checking THE_REQUEST isn't necessary.
Jim
I noticed a behavior that was appending the script's query string to rewritten URL and causing a 404.
Gleamed from earlier lessons by you I added a "?" to clear the query string and it works. But it still adds the "?" after the URL.
Redirect 301 /oldpage/ [example.com...]
Results 404: [example.com...]
Redirect 301 /oldpage/ [example.com...]
Results 200: [example.com...]
How can I remove this?