homepage Welcome to WebmasterWorld Guest from 54.211.34.105
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Remove .html from urls
How to remove .html extension from urls
MrBlack

5+ Year Member



 
Msg#: 3480555 posted 6:54 am on Oct 18, 2007 (gmt 0)

Hi,

I am trying to remove the .html extension from some urls.

I have a folder under my main domain called 'info'. In this folder I have some pages with the .html extension which i want to remove. The result I am trying to get is www.mydomain.com/info/a-webpage/

However I do not want to create extensionless urls for my whole website, only in the 'info' directory, so I have placed an htaccess in the 'info' directory with the following code..

RewriteEngine on
RewriteBase /info/
RewriteRule ^(.+)\.html$ /$1/ [R=301,L]

This nearly works, but it rewrites the url to the root..eg h ttp://a-webpage/ and I cannot figure out why.

Can someone please help, I am pulling my hair out and I haven't got much.

 

akameng

5+ Year Member



 
Msg#: 3480555 posted 11:39 am on Oct 18, 2007 (gmt 0)

hey, Try this:

RewriteEngine on
RewriteBase /info/

#exclude /info/a-webpage/ by!^/info/.+/.*
RewriteCond %{REQUEST_URI}!^/info/.+/.*

#include only ^/info/(.+)\.html$
RewriteRule ^/info/([^/]+)\.html$ /$1/ [R=301,L]

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 12:35 pm on Oct 18, 2007 (gmt 0)

This approach is sort of backwards; A redirect does not 'create' an extensionless URL. In fact, a redirect does not 'create' a URL at all; URLs are defined by the links on your pages. Only filenames are defined on or by your server.

To deploy extensionless URLs:
1) Edit your pages (or your page-generation script) to link to extensionless URLs
2) Add mod_rewrite code to internally rewrite those URLs, when requested from your server, to the correct-extension file.
3) Optional: Detect client requests for URLs with extensions, and externally redirect those to the extensionless URL. The purpose of this is to 'recover' old backlinks and user bookmarks, and to speed up the switchover to your extensionless URLs in search engine results.

So basically, you're trying to do step 3 here without doing the other two steps. This will result in your visitors having to go through the added delay of an external redirect for every extensionless page request, and complicate the search engines' job of indexing those pages.

Also, I question your use of RewriteBase, I don't think you need it here.

And further, extensionless files should not end with a slash; URLs ending with a slash indicate a directory not a file, and this will likely also cause you problems/complications with linked objects on your extensionless pages.

Here are examples of the two rules you might use to implement extensionless URLs for /info .html files, assuming you have changed the links on your pages to remove the .html extensions for URL-paths resolving to the /info directory:

RewriteEngine on
RewriteBase /
#
## Internally rewrite extensionless /info URLs to existing .html files
# If no filetype extension on requested URL
RewriteCond %{REQUEST_URI} !\.[a-z0-9]+$
# If URL plus extension exists as a file
RewriteCond %{REQUEST_FILENAME}.html -f
# Internally rewrite to file with extension
RewriteRule ^info/(.*)$ /info/$1.html [L]
#
## Externally redirect old .html-extension /info URLs to new extensionless URLs
# If direct client request for .html files
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
# Externally redirect to URL without extension
RewriteRule ^info/([^.]+)\.html$ http://www.example.com/info/$1 [R=301,L]

These are freshly-typed and untested. A known limitation is that the simple regex patterns shown here do not support URLs with periods in the directory pathnames.

The check for 'file exists with .html extension' is not strictly required for your simple application. However, I show it here in case you might like to add another file extension later. For example, if the requested URL-path does not resolve to an existing .html file, you could add another rule to check to see if it exists as a .htm or .shtml file. If you only ever plan to support one filetype, you can comment-out or delete the RewriteCond for .html file-exists checking for improved performance.

Jim

akameng

5+ Year Member



 
Msg#: 3480555 posted 7:43 pm on Oct 18, 2007 (gmt 0)

I am sure that jdMorgan have a best solution, But I will suggest only modify this line:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
to
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\s/[^.]+\.html\sHTTP/\d\.\d$

only escaped(\ ) space to \s for easy understanding
and HTTP/1.1 or any version to: HTTP/\d\.\d$ because THE_REQUEST contain The full HTTP request line sent by the browser to the server (e.g., "
GET /index.html HTTP/1.1
"). This does not include any additional headers sent by the browser.
the method will be
(OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT)

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 9:52 pm on Oct 18, 2007 (gmt 0)

Because the pattern ending in HTTP is not end-anchored, there is no need to specify anything past the end of "HTTP/".

The difference between "\ " and "\s" is largely a matter of style.

Jim

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 11:44 pm on Oct 18, 2007 (gmt 0)

When I use "extensionless URLs" they are actually index files each in their own folder. The URL ends with a trailing / every time.

MrBlack

5+ Year Member



 
Msg#: 3480555 posted 11:57 pm on Oct 18, 2007 (gmt 0)

Thanks for the help guys.

g1smd, thats a great idea too and will probably be the best solution for me. But do you need to place a htaccess in every directory to remove the index.html or have you done this with the htaccess in the root?

[edited by: MrBlack at 12:00 am (utc) on Oct. 19, 2007]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 12:09 am on Oct 19, 2007 (gmt 0)

I use the .htaccess file in the root to control everything on the whole site.

All requests for (default¦index)\.(php(4¦5)?¦html?¦cfm¦aspx?) are stripped back to the preceding "/".

All requests with parameters on the end have those stripped too.

MrBlack

5+ Year Member



 
Msg#: 3480555 posted 12:53 am on Oct 19, 2007 (gmt 0)

Ok, this is what I have come up with to remove index.html from the urls in root and all sub directories....

RewriteEngine on
RewriteCond %{THE_REQUEST} ^GET\ /.*/index\.html\ HTTP/
RewriteRule (.*)index\.html$ /$1 [R=301,L]

Can you see any problems with it? Really appreciate your help guys!

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 1:31 am on Oct 19, 2007 (gmt 0)

Unless you use the generic pattern I show in the code I posted above, you'll also want to include the HEAD method, as well as GET.

Jim

MrBlack

5+ Year Member



 
Msg#: 3480555 posted 8:33 pm on Oct 24, 2007 (gmt 0)

Ok, I have now come up with the following code which removes the index.html from the url for the subdirectory and sub-subdirectory

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*/index\.html\ HTTP/
RewriteRule (.*)index\.html$ /$1 [R=301,L]

However I cannot make it work for the root index.html aswell. Any ideas where I am going wrong?

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 9:25 pm on Oct 24, 2007 (gmt 0)


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]

Jim

MrBlack

5+ Year Member



 
Msg#: 3480555 posted 9:44 pm on Oct 24, 2007 (gmt 0)

Thanks very much

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 9:59 pm on Oct 24, 2007 (gmt 0)

So that I can slot the same code on to every website, I don't just test for index.html requests.

I test for (default¦index)\.(php(4¦5)?¦html?¦cfm¦aspx?) and all of those redirect. It also partly hides which technology the site is actually using.

MrBlack

5+ Year Member



 
Msg#: 3480555 posted 4:44 am on Oct 26, 2007 (gmt 0)

So that I can slot the same code on to every website, I don't just test for index.html requests.

I test for (default�index)\.(php(4�5)?�html?�cfm�aspx?) and all of those redirect. It also partly hides which technology the site is actually using.

Sounds good. How would you slot this into the code that jdmorgan provided? I am particularly interested in checking for index.php as well as I am currently converting a site running on php to straight html. The php site had the urls rewritten to .html extensions apart from the root index.php page. Sorry, I am a newbie when it comes to this :)

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 12:52 pm on Oct 26, 2007 (gmt 0)

Replace each: index\.html in the code with (default¦index)\.(php(4¦5)?¦html?¦cfm¦aspx?) instead.

Remember to replace the forum pipe symbols with the correct pipe symbols when you edit this code.

hailg03

5+ Year Member



 
Msg#: 3480555 posted 7:29 am on Mar 29, 2008 (gmt 0)

How do I redirect urls with periods. This is what I am using right now and it works for urls without periods.
Example how do i get all urls similar to www.example.com/you.are.html to www.example.com/you.are

RewriteEngine on
RewriteBase /

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
RewriteRule ^([^.]+)\.html$ /$1 [R=301,L]

Thanks

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3480555 posted 4:12 pm on Mar 29, 2008 (gmt 0)

Change the rule pattern:

RewriteRule ^(.+)\.html$ /$1 [R=301,L]

The original pattern was written for pattern-matching efficiency, but specifically excludes periods anywhere in the URL-path, except preceding the filetype.

See the regular-expressions tutorial cited in our forum charter for more info.

Jim

hailg03

5+ Year Member



 
Msg#: 3480555 posted 5:25 pm on Mar 29, 2008 (gmt 0)

Hey Jim,

I did what you said and also changed
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/

to
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+)\.html\ HTTP/

Which got the result I wanted. Let me know if this change is fine. I really don't know much about mod rewrite.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved