Forum Moderators: phranque
I would like to do two things.
1) Remove the index.html file and extension when visiting my site.
2) Remove the html file extension when viewing any page on my site.
I believe that I have correctly removed the index.html file display when accessing my site.
below are the line of code used.
RewriteEngine on
# For index.html and .htm .shtml .php .php4 .php5 in the root or in any folder
# Works for requests with or without parameters, and preserves original folders:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(s?html?¦php[45]?)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(s?htm¦?php[45]?)$ http://example.com/$1 [R=301,L]
However, I have read (I believe most posts) and tried most responses and have yet been able to correctly remove the html extension.
This is a very, very simple site. But, it would be nice, not necessary that the html extension be removed from the url.
Does anyone have a specific solution to this issue?
[edited by: jdMorgan at 3:15 am (utc) on Jan. 21, 2009]
[edit reason] example.com [/edit]
Next, you set up a rewrite so that when the user asks for www.example.com/somepage the server gets the content from /somepage.html without revealing that filename to the user.
Additionally, you'll want a redirect such that if a client directly asks for example.com/somepage.html or for www.example.com/somepage.html they are redirected to make a new request for www.example.com/somepage instead.
That combination of redirect and rewrite allows there to be just one URL for the content, and the URL to be different to the filename.
You're right, this question gets covered several times per week, sometimes several times per day, and there are hundreds of prior examples to choose from in the forum.
List the redirect before the rewrite and add [L] to the end of each of those rules.
You'll also need a site-wide 301 redirect from non-www to www to make sure that the content cannot be directly accessed at non-www URLs.
Check the sticky thread at the top of the forum for some examples, and post your best effort code here.
I am using extensionless URLs in the links of my pages.
I tried to rewrite so that when the user asks for www.example.com/somepage the server gets the content from /somepage.html without revealing that filename to the user. However, I only succeed some of the times.
I think I listed the redirect before the rewrite and add [L] to the end of each of those rules.
Below is my last try.
RewriteEngine on
# For index.html and .htm .shtml .php .php4 .php5 in the root or in any folder
# Works for requests with or without parameters, and preserves original folders:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(s?html?¦php[45]?)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(s?htm¦?php[45]?)$ http://example.com/$1 [R=301,L]
RewriteCond %{REQUEST_fileNAME} !-d
RewriteCond %{REQUEST_fileNAME} !-f
rewriterule ^(([^/]+/)*[^./]+)$ /$1.html [L]
Most of the time it works, but sometimes I get the html extension. It is odd.
Any ideas?
[edited by: jdMorgan at 3:16 am (utc) on Jan. 21, 2009]
[edit reason] example.com [/edit]
RewriteEngine on
RewriteBase /
# For index.html and .htm .shtml .php .php4 .php5 in the root or in any folder
# Works for requests with or without parameters, and preserves original folders:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(s?html?¦php[45]?)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(s?htm¦?php[45]?)$ http://example.com/$1 [R=301,L]
RewriteCond %{REQUEST_fileNAME} !-d
RewriteCond %{REQUEST_fileNAME} !-f
rewriterule ^(([^/]+/)*[^./]+)$ /$1.html [L]
What do you think?
[edited by: jdMorgan at 3:16 am (utc) on Jan. 21, 2009]
[edit reason] example.com [/edit]
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.html [L]
Jim
As to the RewriteBase / I added. I felt that on some occasions that the path was lost so I added it and it seemed to help the situation. Since it is default behavior my thinking was "it won't hurt, might help" and it did seem to help. I have no idea why it did.
So those that follow this discussion the following is now in my .htaccess file.
----------------------
RewriteEngine on
RewriteBase /
# For index.html and .htm .shtml .php .php4 .php5 in the root or in any folder
# Works for requests with or without parameters, and preserves original folders:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(s?html?¦php[45]?)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(s?htm¦?php[45]?)$http://example.com/$1 [R=301,L]
RewriteCond %{REQUEST_fileNAME} !-d
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
rewriteRule ^(([^/]+/)*[^./]+)$ /$1.html [L]
---------------------------------
Thank you for your help.
RewriteRule ^(([^/]*/)*)index\.(s?htm[b]l?[/b]¦php[45]?)[b]$ h[/b]ttp://example.com/$1 [R=301,L]
Jim
[edited by: jdMorgan at 8:24 pm (utc) on Jan. 21, 2009]
I also noticed that you (corrected?) s?htm to s?html? Is this a style change or does it make a difference?
My Bad about case, you are quite correct about making sure that "%{REQUEST_fileNAME}" should be "%{REQUEST_FILENAME}"
Let's see if it works better.
Thanks - good eyes!
Below is my code.
---------------------------
RewriteEngine on
RewriteBase /
# For index.html and .htm .shtml .php .php4 .php5 in the root or in any folder
# Works for requests with or without parameters, and preserves original folders:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(s?html?¦php[45]?)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(s?html?¦?php[45]?)$ http://example.com/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-d
#RewriteCond %{REQUEST_FILENAME} !-f
#RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.html [L]
--------------------------------------------
This does not make sense. (just to let you know I was getting some intermittent 404's earlier but now it is consistent.)
[edited by: Sawhorse at 7:18 pm (utc) on Jan. 21, 2009]
Your server error log should come in handy here, perhaps indicating an obvious problem when trying to convert the URL-path to a filepath using that method (the server error log shows filepaths, not URLs, and the problem may be obvious to you if the filepath it shows is incorrect).
If the error log file isn't available, then another way to find the problem is to so something like this as a temporary test:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(([^/]+/)*[^./]+)$ http://www.example.com/$1.html?constructed-filepath=%{DOCUMENT_ROOT}/$1.html [R=302,L]
On some shared servers, you have to include an additional path-part in after the document_root -- for example, "%{DOCUMENT_ROOT}/public/$1.html" or some such thing. I would say that such a server is mis-configured, but there must be some reason to do this, as I've seen it occasionally... Unfortunately, it's a bit difficult to debug, and staring at the error log or using the temporary code are the only two debugging methods that are relatively expedient.
Jim
Would you believe the following:
http://example.com/page_name.html?constructed-filepath=/services/webpages//page_name.html
Assuming the above is real I did this.
I committed out the temporary test above and instituted the code below.
#RewriteCond %{DOCUMENT_ROOT}/services/webpages//$1.html -f
I still get 404.
However, if I leave the temporary test code active I do not get a 404.
Now that does not make sense.
RewriteCond %{DOCUMENT_ROO[b]T}$1[/b].html -f
# Externally redirect index.html, .htm, .shtml, .php, .php4, or .php5 in root or in any
# subdirectory to "/" in that same directory, preserving appended query string (if any)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(s?html?¦php[45]?)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(s?html?¦php[45]?)$ http://example.com/$1 [R=301,L]
#
# If URL *does not* resolve to an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
# and *does* resolve to an existing file with ".html" appended
RewriteCond %{DOCUMENT_ROOT}$1.html -f
# then internally rewrite extensionless URL to .html file
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.html [L]
I corrected my code with your code (copied your code).
On two pages it work fine. On two other pages I got a 404.
(Please note that all the html files are in the root.)
I then added your temporary code:
RewriteRule ^(([^/]+/)*[^./]+)$ http://www.example.com/$1.html?constructed-filepath=%{DOCUMENT_ROOT}/$1.html [R=302,L]
I did not get any 404's on any page.
But interesting - I had a clean URL browser's address bar for the first two pages - (it showed the correct address - no .html extension - these are the two pages that worked earlier). However, on the last two pages I received the same URL browser address information as before( http://example.com/page_name.html?constructed-filepath=/services/webpages//page_name.html) that originally had the 404's.
I do not see how adding the temp redirect code would have any effect, but it seems to be doing something.
and leave in the following code:
RewriteCond %{DOCUMENT_ROOT}$1.html -f
In IE I get a blank page.
In FF I get the requested page.
If I do not add the temporary code:
In IE I get a 404.
In FF I get a 404.
There is no further need for the temporary redirect rule; Its only purpose was to reveal the correct filepath for the RewriteCond to test. And of course, since the temporary redirect code still has the 'extra' slash in it, you will still see the double-slashed "constructed-filepath" value...
Jim
I know that I do not need the temporary redirect rule. And when I have removed the extra slash it was of course removed from the "constructed-filepath" value.
What I was trying to indicate (not very well) was that when I use the temporary redirect rule I do not get a 404. When I remove the temporary redirect rule I get a 404. And because the redirect rule is unnecessary and of no use, other than indicating the constructed filepath I have no explanation why this occurs.
If I can not correct this I will need to remove the following:
RewriteCond %{DOCUMENT_ROOT}$1.html -f
Which I do not want to do. Thoughts?
So you might want to compare the filepath as reported by the temporary rule to the filepaths you see (for example) when using FTP to upload files, and see if you can spot the discrepancy. Otherwise, all I can say is that you need a better host if you're going to use complex config code or scripts on your site, because not having access to the server error log files is fairly unacceptable in today's hosting market.
It's also telling that your Document_Root included that trailing slash, because that indicates a server misconfiguration, in that it should be possible to build a valid filepath using
%{DOCUMENT_ROOT}%{REQUEST_URI} on any server (even in the absence of Mod_Rewrite, just speaking generally here). But %{REQUEST_URI} *always* includes a leading slash, so with your server including a trailing slash on Document_Root, trying to build a path using
%{DOCUMENT_ROOT}%{REQUEST_URI} would give us the same double-slash problem that we've already been through using the $1 back-reference method above. As a result, you may also have a lot of trouble with off-the-shelf scripts on this server. :(
Jim
You suggest that I compare filepath reported by the temporary rule to the filepaths you see using FTP to upload files. Well the filepath I see using FTP is the following:
example.com@ftp.example.com:/public/page.html
I agree that "not having access to the server error log files is fairly unacceptable." The host that is being used is a local telephone company (Windstream). I have asked to get access to the error logs.
Since this is a very, very basic site we probable will not be using many off-the-shelf scripts. Just a note that I am able to use Google maps on this site.
So bottom line. What problem will I have if I just do not use RewriteCond %{DOCUMENT_ROOT}$1.html -f
If I have not said this before I have certainly thought it. I really appreciate all the time you have spent with me on this issue.
However, I want to emphasize that your server *is* mis-configured -- and in a way that is specifically warned-about in the Apache documentation: Including the trailing slash on DocumentRoot triggers a known bug in mod_dir, and that may be what is causing this RewriteCond to behave so oddly. Refer your host to the DocumentRoot directive [httpd.apache.org] description in the "Apache core" documentation, and ask them to fix your DocumentRoot declaration by removing the trailing slash. If they argue, ask them to read the last line of that section to you over the phone, and to tell you again that they don't see a problem... :)
If you can't get them to fix this or you can't get access to your log files, then I suggest you run --do not walk-- to the nearest exit. Since you haven't been around long, I'll repeat one of my favorite phrases: "Cheap hosting is the most expensive hosting you can buy!" (Think about how long you've been working on this one problem, and imagine my bill if this were a paid consultancy). There are simply too many nice fish in the hosting-services pond to put up with one that is sick or emaciated...
Jim
From all the problems we are having I must *strongly* agree with you that the server is mis-configured. Very nice information about the DocumentRoot directive description in the "Apache core" documentation!
I really doubt that a large phone company will listen to me. When I asked for the error logs, the low level tech said that he would put in a ticket for the request. I asked why the logs were not readily available? He responded that he had never been asked for access to the error logs before. So, you see I believe that this might be a loosing battle.
I love your "favorite phrase" and I agree. I have another site that I built that is on a nice server that runs cPanel Version 11 as the WebHost Manager Interface. I can get to everything.
Again, I appreciate your sharing your knowledge. And I am sure others that read this thread will also. Thank you.
The DOCUMENT_ROOT is not (never was) and never will be a *reliable* way to determine the filesystem path to your web folder.
Are you hosted on apache 1.3? Apache 2 normalizes all paths, so you shouldn't fall into issues with multiple slashes (in the mapping phase, mod_rewrite's -d/-f checks or wherever).