Forum Moderators: phranque
I have spent all day reading, mostly on this website, trying to figure out how to make my previously working redirects work under Apache 2.2.X as my host recently upgraded.
Primarily, I used mod rewrite to remove .php from the end of files. However, I also had it working so that www.example.com/topic would work without returning a 403 error even though a directory at www.example.com/topic/ exists without an index.php file inside it. (Basically, Apache wouldn't look for /topic/index.php, but rather for /topic.php and remove the .php) I hope this makes sense.
The line that seems to no longer work (bolded in the code below) was something I came up with to allow for what I explained above. It seems like it may have been the wrong way to tell Apache to look for a file with the same name as a directory ather than inside the directory for it's index file, especially since it no longer works, but any tips or suggestions would be greatly appreciated.
--------------------------------
# MOD REWRITE
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301]
#removes .php
RewriteCond %{THE_REQUEST} ^GET\ (.*)\.php\ HTTP
RewriteRule (.*)\.php$ $1 [R=301]
#removes "index"
RewriteRule (.*)/index$ $1 [R=301]
#removes "/"
RewriteRule (.*)/$ $1 [R=301]
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1\.php [L]
------------------------------------
"RewriteBase /" is unnecessary, as it is the default behaviour, and need not be specified unless you have a previous RewriteBase directive which has declared a non-default path.
Your rules are in the wrong order: Put external redirects first, ordered from most-specific pattern to least-specific, followed by internal rewrites, again ordered from most-specific pattern to least-specific pattern.
Use the [L] flag on every rule, unless you have a specific reason not to do so, use start and end anchors, and avoid doing unnecessary "file exists" checks whenever possible for efficiency.
Add and maintain detailed, accurate comments so that this code will still make sense three years from now, when you need to make a change or addition.
# MOD REWRITE
#
RewriteEngine On
#
# Externally redirect to remove "index" or "index.php"
RewriteCond %{THE_REQUEST} ^GET\ /(.+)\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.php)?$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove ".php" (for all but "index.php" already done above)
RewriteRule ^(.+)\.php$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove trailing "/", except for existing directories
# RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to canonicalize the hostname, if no previous rules have done it
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite requested URLs (which have no file extension and which resolve
# to existing files when ".php" is appended) to the requested URL plus ".php"
RewriteCond $1 !\.[a-z0-9]+$ [NC]
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule (.*) $1\.php [L]
Thanks so much for your detailed and quick response.
One thing that you mentioned that I don't seem to understand is what makes patterns specific or non-specific. For example, I assumed .php would be the most specific part in this case, followed by filenames (like index), directories, and finally the domain. It seems like I definitely misunderstand.
Also, I am not quite sure why we want the [L] flag on rules that we want affected by rules further down, as what I could find about [L] is that it means to not try to match more rules if one with an [L] succeeded. It seems I am mistaken here as well.
I greatly appreciate your amazing level of help, but I am also having some trouble modifying your example code to work for my site.
To make sure - should this line that you wrote not be commented out in your example above? :
# RewriteCond %{REQUEST_FILENAME} !-d
I have tried debugging to the best of my ability, but have not come up with anything too significant that I can make sense of. Here are my findings:
As is (and assuming you didn't mean to comment out the above line) every page except home page gets a redirect loop error.
If I comment out the second rule ("RewriteRule ^(.+)\.php$ http://www.example.com/$1[R=301,L]") then, assuming there's no "contact" directory, a page such as example.com/contact will work as expected, but example.com/contact.php will work and without removing ".php".
Now, also with that line commented out, if I create a "contact" directory, then example.com/contact/index.php will correctly attempt to redirect to example.com/contact, and requesting example.com/contact seems to also try to display the correct contact.php file, however it throws a redirect loop error without displaying the page. This also happens if i simply request example.com/contact, which worked without the directory added.
I hope this helps.
Thanks so much!
Bjorn
More-specific means "having a more selective regular-expressions pattern" or "affecting fewer URL-paths".
Use the [L] flag, because you do NOT want later rules to be applied to the URLs -- Not within the context of the current HTTP transaction, anyway. The only time you don't use [L] is when you want to use multiple steps to do a very-complex rewrite on the same URL. But that does not work well anyway, due to a very-old and very-bad mod_rewrite bug that has never been fixed. As a result, rules that should be coded without [L] are very, very rare.
Jim
I was wondering if the condition: RewriteCond %{REQUEST_FILENAME} !-d is not what I want as a check for when to remove the trailing slash - as I want it to find /topic.php over /topic/ . I may not be understanding this right though. I've tried remove the condition and changing it to check for a file instead, which seems like it may be working more like I want it to in terms of the url, but both result in redirect loops.
Do you have any ideas why the code might not be working ? I've tried modifying it to work as my original since your last post, and I can't seem to get anywhere. I must admit I have a hard time understanding the reg-exp in the top condition/rule.
Using your suggested code:
# MOD REWRITE
#
RewriteEngine On
#
# Externally redirect to remove "index" or "index.php"
RewriteCond %{THE_REQUEST} ^GET\ /(.+)\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.php)?$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove ".php" (for all but "index.php" already done above)
RewriteRule ^(.+)\.php$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove trailing "/", except for existing directories
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to canonicalize the hostname, if no previous rules have done it
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite requested URLs (which have no file extension and which resolve
# to existing files when ".php" is appended) to the requested URL plus ".php"
RewriteCond $1 !\.[a-z0-9]+$ [NC]
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule (.*) $1\.php [L]
www.example.com/topic gives me a redirect loop
www.example.com/topic.php removes .php but also a redirect loop
www.example.com/topic if a directory "topic" exists gives a redirect loop
www.example.com/topic/subtopic.php gives me a 404 and a redirect loop
=)
RewriteCond %{REQUEST_FILENAME} !-d Rule #1 has a bug, though it should not cause the problems you describe. Correction:
# Externally redirect to remove "index" or "index.php" if directly requested by the client
RewriteCond %{THE_REQUEST} ^GET\ /[b]([^/]+/)*index(\.php)?[/b](\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.php)?$ http://www.example.com/$1 [R=301,L]
# Externally redirect to remove ".php" from direct client requests only
# (for all but "index.php" already done above)
RewriteCond %{THE_REQUEST} ^GET\ /.+\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^(.+)\.php$ http://www.example.com/$1 [R=301,L]
It seems the new RewriteCond for Rule #2 works perfectly, and the correction to Rule #1 makes that work perfect as well. I am extremely grateful for your help with these. The regular expressions were much easier for me until they were rewritten efficiently/properly =)
I appreciate your clarification regarding the !-d RewriteCond. I think I had it understood correctly, but that I may not have explained well enough what I'm trying to accomplish. I may however still be mistaken.
Everything works as hoped now except for the Cond/Rule related to the !-d (I think). When example.com/topic is requested, it still tries to serve example.com/topic/(index.php) rather than example/topic(.php) if the directory /topic/ exists - whereas I want it to show /topic(.php) (if such a file exists) rather than it trying to look for an index.php file inside the /topic/ directory.
Basically, right now /topic works, showing /topic.php unless I create a directory named "topic". Then it errors unless I create an index.php file inside the /topic/ directory.
This is why I thought the the RewriteCond %{REQUEST_FILENAME} !-d condition may not be right in this case, since I want it to remove the trailing / even if the directory exists.
RewriteCond %{DOCUMENT_ROOT}/$1.php -f"
# Externally redirect to remove trailing "/" for URLs which will resolve
# to existing php scripts after being rewritten by the code below
Jim
In this .htaccess file, you have no other rules that rewrite or redirect to add a trailing slash, so the problem is not likely to be in the mod_rewrite code. However, one of mod_dir's stated purposes is to add a missing trailing slash for existing directories.
I suppose it's also possible that the LoadModule order in your server configuration is wrong, and that mod_dir is running first. You'd have to get your host to fix that, I'm afraid. If so, they need to be aware that Apache modules execute in the *reverse* order that they are loaded by the LoadModule list; In order to fix this problem, mod_dir must be loaded *before* mod_rewrite, so that it will execute after mod_rewrite.
In the meantime, try adding "Options -MultiViews" at the head of the file. If that doesn't work please re-post your current code; I will be back in a few hours, but in the meantime that might make it easier for someone else to assist you; It's still possible that I've missed something.
Jim
I tried adding the -MultiViews option, and it seems to be the same as with the -f condition and commenting the line out. Here is my current config:
--------------------------------------
Options -MultiViews
# MAKE DIRECTORIES NOT BROWSABLE
Options -Indexes
# ------------------------------
# SET DEFAULT DIRECTORY INDEX
DirectoryIndex index.php
# ------------------------------
# MOD REWRITE
#
RewriteEngine On
#
# Externally redirect to remove "index" or "index.php" if directly requested by the client
RewriteCond %{THE_REQUEST} ^GET\ /([^/]+/)*index(\.php)?(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.php)?$ http://www.example.com/$1 [R=301,L]
# Externally redirect to remove ".php" from direct client requests only
# (for all but "index.php" already done above)
RewriteCond %{THE_REQUEST} ^GET\ /.+\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^(.+)\.php$ http://www.example.com/$1 [R=301,L]
# Externally redirect to remove trailing "/" for URLs which will resolve
#to existing php scripts after being rewritten by the code below
#RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to canonicalize the hostname, if no previous rules have done it
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Internally rewrite requested URLs (which have no file extension and which resolve
# to existing files when ".php" is appended) to the requested URL plus ".php"
RewriteCond $1 !\.[a-z0-9]+$ [NC]
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule (.*) $1\.php [L]
A "trailing slash" redirect is issued when the server receives a request for a URL http://servername/foo/dirname where dirname is a directory. Directories require a trailing slash, so mod_dir issues a redirect to http://servername/foo/dirname/.
So, you have three choices:
1) Ask your host to disable mod_dir or more to a position before mod_rewrite in the LoadModule list -- a request that they are very unlikely to comply with unless you are on a dedicated server or VPS.
2) Move all 'real' files to the top-level directory, so that no real subdirectories exist (impractical for any but a tiny site).
3) Rename your directories or your extensionless URLs so that collisions between directory names and extensionless URLs cannot occur (or are very, very unlikely). For example, your extensionless URLs could all be in the form "/content/<page>" or your 'real' directories could all be moved into and under a subdirectory (for example, move the current /images subdirectory to /site/images and change the URLs in all links to those images (You would then add code to block any request for an extensionless URL of "/site" so that it could never be used. This would reduce your collision exposure to just that single path.)
Jim
That's a shame. So do you think that the reason my initial rule RewriteRule (.*)/$ $1 [R=301] worked until my host's recent upgrade, and that that rule and the one you provided RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L] don't work now has to do with something else that changed on their end, other than an updated Apache version ? I just found out that they also changed their mod_security rules, but haven't seen anything about a mod_dir change.
After their upgrade, my initial rewriterule functioned in the exact same way as your new rule (the end result, redirect loop). I can only assume that your rule would have also created the desired effect before the upgrade. It does seem like you are right about mod_dir, I am just wondering what may have allowed it to work before.
The reason I am rather reluctant to re-structure the site is that these rules' purpose for me is to allow me to not have to re-structure a site (more than necessary) as it grows, and it was in use on most sites I would develop.
Right now on my site, for example, I have /contact, /pay, /services, /articles, /portfolio, /testimonials. Except for /articles, each of these pages were initially the only pages on the site, with no directories matching their names.
Without these rules working, /articles/ would have appeared different (deeper) than the others, and have to be placed as index.php inside /artices/, rather than at /articles.php with the other files like it.
Over time, directories /services/ and /portfolio/ were added as well, to allow for /services/primary-service, /services/secondary-service, /portfolio/first-entry, /portfolio/second-entry, etc.
Without these rules, I will have to move /topic.php to /topic/index.php, create a 301 redirect for it, and have /topic/ now appear to be a second level file, every time a page acquires sub-pages.
This method seems to me to be the most natural way that a site progresses, and it does work without rewriting urls to remove .php (/topic.php would not result in /topic/ trying to be read, but /topic does). I guess it seems odd that something as common as having a file about a topic and a sub-directory that contains the specifics about that topic (/store, /store/widgets, /store/widgets/blue) doesn't align with something as common as rewriting urls to remove file extensions.
So the problem is not the result of the Apache version upgrade in and of itself. The problem is the result of mod_dir now running before mod_rewrite, grabbing those extensionless URLs before your rule can add ".php" to them and invoke your scripts.
It is a module execution order problem, not a server version problem.
I suggest that you tell your host that they broke your site, and that the relative execution order of mod_dir and mod_rewrite is now incorrect.
You might want to try adding the Live HTTP Headers add-on to Firefox, and then request one of those extensionless URLs which also will resolve to a directory if a trailing slash is added. Look at the server responses, and if you see a redirect response that adds a trailing slash, you'll have some pretty solid evidence, because there is no rule in your code above which *adds* a trailing slash. Therefore, the redirect that adds a trailing slash must be the result of some other code or some other module.
This situation is just wrong, because mod_dir should run after mod_rewrite, not before. I think you have a legitimate complaint about your host's current server configuration, because mod_dir was almost always configured to run after mod_rewrite on Apache 1.x.
Jim
Thanks makes complete sense.
I've actually got Live HTTP Headers installed after reading a post of yours in another thread, and I've taken a look at the site with it.
If I request /contact without a contact directory existing, the headers show REQUEST GET /contact HTTP/1.1
If I make a /contact/ directory, the headers show REQUEST GET /403 HTTP/1.1 (as it tries to load an index.php from inside the /contact/ dir that doesn't exist, and I have Options -Indexes set)
If I add a index.php file to the /contact/ dir, the headers show REQUEST GET /contact/ HTTP/1.1
Is that what we were hoping to see ?
I will get in contact with my host and post any results.
Thanks!
Bjorn
Bjorn,
Yes, I agree. I noticed this order had changed while looking at a similar issue on another of our servers. There are a few strategies I have found useful here:
- Add a trailing slashes to URLs and match strings when possible
- Use the [L] directive whenever appropriate
- Use the full URL for the rewrite address
Please let us know if there is anything else we can do to help you with.
Regards,
#########
Me: What about changing it back to the correct order ?
Response:
Hello,
Unfortunatly, this was changed with upgrading to apache 2.2. We are not going to be able to change this back.
If you have any questions, just let us know.
#########
Linux System Admin
Bjorn,
I understand the issue you are facing, however we do not directly control the httpd.conf file. This is generated from templates via cpanel and this is something that even if we changed, would be rewritten the next time cpanel rewrote the file. So what you are suggesting is a possible fix that that leaves you vulnerable to the same issue happening again, at an unknown time in the future. As a general rule, I try to find solutions that are going to work no matter what happens.
That type of solution has already been explained, adding trailing slash mod_rewrite rules will work for once and all, as would using the Mod_Dir directive:
DirectorySlash Off
With a <files> matching rule to disable the mod_dir added slashes. Do you understand why these would be better ways forward?
If you have any further questions, don't hesitate to contact us.
#######
Server Admin
One problem I have is that I stay mostly within the Apache 1.x documentation, so as not to recommend solutions here that won't be available to people on older servers (it seems that the vast majority are still hosted on 1.3.x.) So, I flat-out missed that new addition to mod_dir's directives in Apache 2.x, or perhaps having seen it a few times, just forgot about it.
Your admin didn't mention that one at first, but it'll save the day. I suggest sending profuse thanks his/her way -- from both of us!
Now since that AddSlash function won't be working any more, and since it's an important "fix-up" for incorrect URLs entered by users and requested by search engines, we probably want to reproduce it, but in a more-conditional scope:
# Externally redirect to add a trailing slash for URLs which will NOT resolve to existing php
# scripts when .php is added, but which will resolve to an existing directory when a slash is
# added. For efficiency, exclude URLs with a filetype and those that already end with a slash.
RewriteCond $1 !^([^/]+/)*[^.]*\.
RewriteCond $1 !/$
RewriteCond %{REQUEST_FILENAME}.php !-f
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule (.*) http://www.example.com/$1/ [R=301,L]
This new rule will go nicely right after current rule #3.
The result should be that if you request a URL with no trailing slash, and there is no .php file by that name, but there is a directory, then a redirect to add a trailing slash should occur. After that, the behavior will depend on the -Indexes setting and whether you have an index file in that directory.
Also, I want to go back to something I saw in your Live Headers transcript: What is the name of your 403 error page? It looked like our code may have been redirecting the 403 error page URL, and we don't want that! (I see a request of "GET /403" with no filetype on it.) That could be a very serious problem if it was the result of a redirect, but it's something that can be fixed by excluding error page URLs from the strip-slash redirect and possibly from rewrites as well.
Jim
[edited by: jdMorgan at 2:28 pm (utc) on Nov. 21, 2008]
I made sure to thank the admin for us both =)
The new rule you provided works perfectly as is - it strips the slash if an index.php doesn't exist, and will strip /index and /index.php. If an index.php does exist inside a directory with a file with the same name, it will return what is expected depending on what's requested. =D
Here is what I have set up for 404/403 errors:
ErrorDocument 404 http://www.example.com/404.php
ErrorDocument 403 http://www.example.com/403.php
Which should explain the "GET /403" request hopefully. I don't think this is the best way to handle 404/403 errors - especially as it turns whatever URL they requested into www.example.com/404(3), making editing the url for them to replace a potential typo very cumbersome. But, now that the rest of the site is working as intended again and I'm not freaking out about google rankings dropping and urls changing, I will surely look into better ways of serving 403/404 documents. =)
I am so grateful for your help in resolving this!
RewriteCond $1 !^40[34]\.php$
I'd suggest adding this code line rather than changing the ErrorDocument lines, simply because it's better not to be redirecting ErrorDocument URLs to remove the extension, or having the extra dependency of having to rewrite them to add ".php" get to the real document while your server is already handling an error.
If you do this, then the final rule won't be invoked for the 40x error documents, because they will still have their .php file extensions, so adding the RewriteCond to the second rule solves both problems.
Jim
I have implemented your RewriteCond solution. I don't know how to tell for sure, but it seems like it's working (it didn't break anything) =)
I have made a couple modifications to the latest code we've come up with in order to cope with a few things:
First, I changed
RewriteCond $1 !\.[a-z0-9]+$ [NC]
to
RewriteCond $1 !\.php+$ [NC]
because I actually had some files with filenames such as /portfolio/www.example.com.php (and .php was removed). This modification seems to make those work without breaking anything. I also only use .php files, at the moment at least.
Furthermore, for the following chunk of code, I have added the condition in bold:
RewriteCond $1 !^([^/]+/)*[^.]*\.
RewriteCond $1 !/$
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{REQUEST_FILENAME}.php !-f
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule (.*) http://www.example.com/$1/ [R=301,L]
I did this because everything worked good, except the home page. What it would do was redirect to www.example.com// and return a redirect loop when the home page was requested. I'm not sure if this function is what your line:
RewriteCond $1 !/$
was supposed to do?
[edited by: encyclo at 11:08 am (utc) on Nov. 24, 2008]
[edit reason] switched to example.com [/edit]