Forum Moderators: phranque
Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]RewriteCond %{THE_REQUEST} ^.*\/index\.htm?
RewriteRule ^(.*)index\.html?$ http://www.example.com/$1 [R=301,L]RewriteRule ^([a-z0-9\-]+)$ /$1.htm [L]
I found a program that renamed all my local *.htm files to just *, with out any extension.
I uploaded them to the webserver, and of course now i have two versions of the same file:
www.example.com/filename.htm
www.example.com/filename
There are a few external links pointing to the .htm files, so I can't delete them off the server, but getting it to 301 to an extensionless file is proving to be difficult..
On a side note, before I added just the
RewriteRule ^([a-z0-9\-]+)$ /$1.htm [L]and renamed filename.htm to just filename, I was able to view it using IE7, but Chrome and FF both showed the page as source code instead of rendering it...
Anyways, any tips?
So you should start by re-renaming the files, then you need to define exactly what your aims are - do you want all files to be served without extensions, or just the HTML ones? Are you looking to do actual negotiation (multiple versions of the same content in different versions)?
[webmasterworld.com...]
Set up a rewrite such that when an extensionless URL is requested, the server serves the content of the matching .html file. Since it is a rewrite, the true name of the file will not be exposed out to the web.
Set up a redirect, such that if .html URLs are requested, the user is redirected to the appropriate matching extensionless URL.
Make sure you also have the usual index to / and non-www to www redirects also in the same file.
.
Quick pointers.
Do all of this with RewriteRule and RewriteCond. Do not mix any Redirect or RedirectMatch code in with this.
List the redirects from most specific (index files) first to most general (non-www to www) last.
The redirects must have the full domain name in the target URL, and R=301. Force www at the same time in every one.
List all of the redirect(s) before the rewrite.
All rules (redirects or rewrites) must end with [L].
Don't feel free to mess with the capitalisation of the syntax, unless you're prepared for some future incompatibility with the code.
The index redirect is very inefficient due to the use of .* in the pattern. There's much better ways to code that.
Note that ^(.*)$ simplifies to (.*) here too.
Then add some mod_rewrite code to internally rewrite extensionless *URL* requests to .html *files*, if those files exist.
Finally, add some mod_rewrite code to externally redirect any direct client (user or robot) requests for .html-extension URLs to the corresponding extensionless URLs.
A rewrite takes a URL request and finds the file on the server to pull the content from - a filename that is different to the one that might have been hinted at by the path in the URL.
You need an additional redirect to stop the files being directly accessed by their 'true' URL. This redirects the user to make a new request. This request is for the URL that you want users to 'see' and 'use' to access that content on the web.
Options All Indexes
IndexOptions FancyIndexingOptions +FollowSymlinks
RewriteEngine on# If requested URL-path plus ".htm" exists as a file
RewriteCond %{DOCUMENT_ROOT}/$1.htm -f
# Rewrite to append ".htm" to extensionless URL-path
RewriteRule ^(([^/]+/)*[^.]+)$ /$1.htm [L]## Internally rewrite extensionless file requests to .htm files ##
#
# If the requested URI does not contain a period in the final path-part
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
# and if it does not exist as a directory
RewriteCond %{REQUEST_FILENAME} !-d
# and if it does not exist as a file
RewriteCond %{REQUEST_FILENAME} !-f
# then add .html to get the actual filename
RewriteRule (.*) /$1.htm [L]## Externally redirect clients directly requesting .html page URIs to extensionless URIs
#
# If client request header contains html file extension
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+\.)+htm\ HTTP
# externally redirect to extensionless URI
RewriteRule ^(.+)\.htm$ http://www.example.com/$1 [R=301,L]rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]RewriteCond %{THE_REQUEST} ^.*\/index\.htm?
RewriteRule ^(.*)index\.html?$ http://www.example.com/$1 [R=301,L]
I updated the links on all my pages to no longer point to .htm even though the page names still end with .htm
i structured the .htaccess to
1. convert extensionless paths to .htm internally
2. convert external .htm requests to extensionless urls
3. 301 non-www and index.htm to www and root...
Hows that look?
[edited by: youfoundjake at 2:50 am (utc) on Mar. 9, 2009]
Likewise, list the most-specific redirects first and the most-general redirects last, otherwise some requests will go through a redirection chain, instead of direct to target in just one hop.
Put [L] on every Rule. You missed one.
The
^(.*)$ simplifies to (.*) too. Your index redirect uses (.*) which is very inefficient. You'll need the one from [webmasterworld.com...] - do note the correction on Page 3 of that thread.
Check my 'quick pointers' list again for all the steps.
Options All Indexes
IndexOptions FancyIndexing
Options +FollowSymlinks
RewriteEngine on## Externally redirect clients directly requesting .html page URIs to extensionless URIs
#
# If client request header contains html file extension
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+\.)+htm\ HTTP
# externally redirect to extensionless URI
RewriteRule ^(.+)\.htm$ http://www.example.com/$1 [R=301,L]# If requested URL-path plus ".htm" exists as a file
RewriteCond %{DOCUMENT_ROOT}/$1.htm -f
# Rewrite to append ".htm" to extensionless URL-path
RewriteRule ^(([^/]+/)*[^.]+)$ /$1.htm [L]## Internally rewrite extensionless file requests to .htm files ##
#
# If the requested URI does not contain a period in the final path-part
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
# and if it does not exist as a directory
RewriteCond %{REQUEST_FILENAME} !-d
# and if it does not exist as a file
RewriteCond %{REQUEST_FILENAME} !-f
# then add .html to get the actual filename
RewriteRule (.*) /$1.htm [L]rewritecond %{http_host} ^example.com [nc]
rewriterule (.*)http://www.example.com/$1 [R=301,nc,L]RewriteCond %{THE_REQUEST} ^.*\/index\.htm?
RewriteRule ^(.*)index\.html?$ http://www.example.com/$1 [R=301,L]
Now, as far as the index redirect being ineffecient, I'm not using any subdirectories and all the pages are .htm so do I really need to add the code for php and asp?
1. Requesting: http://example.com/page.htm
GET /page.htm HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: example.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)
Server Response: 301 Moved Permanently
Date: Mon, 09 Mar 2009 03:16:10 GMT
Server: Apache/1.3.41 (Unix) Resin/2.1.13 mod_fastcgi/2.4.6 mod_log_bytes/1.2 mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a
Location: http://www.example.com/page
Keep-Alive: timeout=5, max=149
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
Redirecting to http://www.example.com/page ...2. Requesting: http://www.example.com/page
GET /page HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: www.example.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)
Server Response: 200 OK
Date: Mon, 09 Mar 2009 03:16:10 GMT
Server: Apache/1.3.41 (Unix) Resin/2.1.13 mod_fastcgi/2.4.6 mod_log_bytes/1.2 mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a
Last-Modified: Mon, 09 Mar 2009 01:54:48 GMT
ETag: "4ae34ea-1cd9-49b476e8"
Accept-Ranges: bytes
Content-Length: 7385
Keep-Alive: timeout=5, max=150
Connection: Keep-Alive
Content-Type: text/html
You can omit the .php and .asp stuff from the index rule, but you do need to replace the
.*/ part with the /([^/]+/)* pattern, and the .* part with the (([^/]+/)*) pattern.
Options All Indexes
IndexOptions FancyIndexing
Options +FollowSymlinks
RewriteEngine on## Externally redirect clients directly requesting .html page URIs to extensionless URIs
#
# If client request header contains html file extension
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]+\.)+htm\ HTTP
# externally redirect to extensionless URI
RewriteRule ^(.+)\.htm$ http://www.example.com/$1 [R=301,L]# If requested URL-path plus ".htm" exists as a file
RewriteCond %{DOCUMENT_ROOT}/$1.htm -f
# Rewrite to append ".htm" to extensionless URL-path
RewriteRule ^(([^/]+/)*[^.]+)$ /$1.htm [L]## Internally rewrite extensionless file requests to .htm files ##
#
# If the requested URI does not contain a period in the final path-part
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
# and if it does not exist as a directory
RewriteCond %{REQUEST_FILENAME} !-d
# and if it does not exist as a file
RewriteCond %{REQUEST_FILENAME} !-f
# then add .html to get the actual filename
RewriteRule (.*) /$1.htm [L]RewriteCond %{THE_REQUEST} ^.*\/index\.htm?
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1 [R=301,L]rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]
My brain hurts..
Do I even want the non-www to www redirect before the file based rewrites?
Thanks for the help Ian..
Google is already seeing exposed URLs, because you have the instructions in the wrong order. For some URLs, it is .htaccess that needs to be fixed, not the sitemap.
That is, if you do the rewrite first, the internal file pointer will be updated the show the internal filepath location of the content and then a following redirect will expose that filepath back out into the URL. You do not want that to happen. That is why you do redirects first. Fix the URL the user sees before doing a rewrite to get the content from a different location.
Please check very carefully the complete list of what to do at the top of this thread. You do need all of those steps, otherwise you will get the above behavior as well as some requests going through a redirection chain.
RewriteCond %{THE_REQUEST} ^.*\/index\.htm? - in this part, htm? should be html? and the / before index does NOT need escaping (that is, \/ should be / only). See also, above, for a more efficient replacement for the .* part of this line. I would use this as it is way more efficient: RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/ - it works for both .htm and .html extensions. I always use the full version with the .php and .a/jsp(x) checks on all sites, so that the code is completely portable, and so that I don't have to think about it. I did that after accidentally using the .php version on a site that used index.htm files - and didn't notice until Google had indexed some of the index.htm URLs.
Capitalisation of Apache Keywords in the non-www redirect is messed up, but I'm just repeating myself now. All of the steps are listed in this thread, and all of them are necessary.
[edited by: g1smd at 9:42 am (utc) on Mar. 10, 2009]
IndexOptions FancyIndexing
Options All
RewriteEngine on
#
# Externally redirect client /index page requests to "/"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.html?
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect client requests contains htm/html extension to extensionless URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*[^.]+\.html?
# externally redirect to extensionless URI
RewriteRule ^(([^/]+/)*[^.]+)\.html?$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect non-blank non-canonical hostname requests to canonical hostname
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# If requested extensionless URL-path does not resolve to an existing directory
RewriteCond %{REQUEST_FILENAME} !-d
# and if requested extensionless URL-path plus ".htm" does resolve to an existing file
RewriteCond %{REQUEST_FILENAME}.htm -f
# then append ".htm" to resolve the actual filename
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.htm [L]
[edit] Corrected as noted below. [/edit]
[edited by: jdMorgan at 12:02 am (utc) on Mar. 11, 2009]
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, webmaster@example.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.
RewriteRule ^(([^/]+/)*[^.]+\.html?$ http://www.example.com/$1 [R=301,L] Unmatched brackets, two opening, but only one closing.
RewriteRule ^(([^/]+/)*[^.]+[b])[/b]\.html?$ http://www.example.com/$1 [R=301,L] The bold shows the addition. Always start by looking for obvious logic errors.
[Heh. Do I get extra points for spotting a jd typo?] :)
You can add a negative match RewriteCond to stop them being rewritten, or make your script deal with the request and serve the expected content. Your choice.