Forum Moderators: phranque
I am trying to remove the .html extension from some urls.
I have a folder under my main domain called 'info'. In this folder I have some pages with the .html extension which i want to remove. The result I am trying to get is www.mydomain.com/info/a-webpage/
However I do not want to create extensionless urls for my whole website, only in the 'info' directory, so I have placed an htaccess in the 'info' directory with the following code..
RewriteEngine on
RewriteBase /info/
RewriteRule ^(.+)\.html$ /$1/ [R=301,L]
This nearly works, but it rewrites the url to the root..eg h ttp://a-webpage/ and I cannot figure out why.
Can someone please help, I am pulling my hair out and I haven't got much.
To deploy extensionless URLs:
1) Edit your pages (or your page-generation script) to link to extensionless URLs
2) Add mod_rewrite code to internally rewrite those URLs, when requested from your server, to the correct-extension file.
3) Optional: Detect client requests for URLs with extensions, and externally redirect those to the extensionless URL. The purpose of this is to 'recover' old backlinks and user bookmarks, and to speed up the switchover to your extensionless URLs in search engine results.
So basically, you're trying to do step 3 here without doing the other two steps. This will result in your visitors having to go through the added delay of an external redirect for every extensionless page request, and complicate the search engines' job of indexing those pages.
Also, I question your use of RewriteBase, I don't think you need it here.
And further, extensionless files should not end with a slash; URLs ending with a slash indicate a directory not a file, and this will likely also cause you problems/complications with linked objects on your extensionless pages.
Here are examples of the two rules you might use to implement extensionless URLs for /info .html files, assuming you have changed the links on your pages to remove the .html extensions for URL-paths resolving to the /info directory:
RewriteEngine on
RewriteBase /
#
## Internally rewrite extensionless /info URLs to existing .html files
# If no filetype extension on requested URL
RewriteCond %{REQUEST_URI} !\.[a-z0-9]+$
# If URL plus extension exists as a file
RewriteCond %{REQUEST_FILENAME}.html -f
# Internally rewrite to file with extension
RewriteRule ^info/(.*)$ /info/$1.html [L]
#
## Externally redirect old .html-extension /info URLs to new extensionless URLs
# If direct client request for .html files
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
# Externally redirect to URL without extension
RewriteRule ^info/([^.]+)\.html$ http://www.example.com/info/$1 [R=301,L]
The check for 'file exists with .html extension' is not strictly required for your simple application. However, I show it here in case you might like to add another file extension later. For example, if the requested URL-path does not resolve to an existing .html file, you could add another rule to check to see if it exists as a .htm or .shtml file. If you only ever plan to support one filetype, you can comment-out or delete the RewriteCond for .html file-exists checking for improved performance.
Jim
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
to
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\s/[^.]+\.html\sHTTP/\d\.\d$
only escaped(\ ) space to \s for easy understanding
and HTTP/1.1 or any version to: HTTP/\d\.\d$ because THE_REQUEST contain The full HTTP request line sent by the browser to the server (e.g., "
GET /index.html HTTP/1.1
"). This does not include any additional headers sent by the browser.
the method will be
(OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*/index\.html\ HTTP/
RewriteRule (.*)index\.html$ /$1 [R=301,L]
However I cannot make it work for the root index.html aswell. Any ideas where I am going wrong?
So that I can slot the same code on to every website, I don't just test for index.html requests.I test for (default�index)\.(php(4�5)?�html?�cfm�aspx?) and all of those redirect. It also partly hides which technology the site is actually using.
Sounds good. How would you slot this into the code that jdmorgan provided? I am particularly interested in checking for index.php as well as I am currently converting a site running on php to straight html. The php site had the urls rewritten to .html extensions apart from the root index.php page. Sorry, I am a newbie when it comes to this :)
RewriteEngine on
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
RewriteRule ^([^.]+)\.html$ /$1 [R=301,L]
Thanks
RewriteRule ^(.+)\.html$ /$1 [R=301,L]
See the regular-expressions tutorial cited in our forum charter for more info.
Jim