homepage Welcome to WebmasterWorld Guest from 54.227.56.174
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Remove .html from urls
How to remove .html extension from urls
MrBlack




msg:3480557
 6:54 am on Oct 18, 2007 (gmt 0)

Hi,

I am trying to remove the .html extension from some urls.

I have a folder under my main domain called 'info'. In this folder I have some pages with the .html extension which i want to remove. The result I am trying to get is www.mydomain.com/info/a-webpage/

However I do not want to create extensionless urls for my whole website, only in the 'info' directory, so I have placed an htaccess in the 'info' directory with the following code..

RewriteEngine on
RewriteBase /info/
RewriteRule ^(.+)\.html$ /$1/ [R=301,L]

This nearly works, but it rewrites the url to the root..eg h ttp://a-webpage/ and I cannot figure out why.

Can someone please help, I am pulling my hair out and I haven't got much.

 

akameng




msg:3480735
 11:39 am on Oct 18, 2007 (gmt 0)

hey, Try this:

RewriteEngine on
RewriteBase /info/

#exclude /info/a-webpage/ by!^/info/.+/.*
RewriteCond %{REQUEST_URI}!^/info/.+/.*

#include only ^/info/(.+)\.html$
RewriteRule ^/info/([^/]+)\.html$ /$1/ [R=301,L]

jdMorgan




msg:3480772
 12:35 pm on Oct 18, 2007 (gmt 0)

This approach is sort of backwards; A redirect does not 'create' an extensionless URL. In fact, a redirect does not 'create' a URL at all; URLs are defined by the links on your pages. Only filenames are defined on or by your server.

To deploy extensionless URLs:
1) Edit your pages (or your page-generation script) to link to extensionless URLs
2) Add mod_rewrite code to internally rewrite those URLs, when requested from your server, to the correct-extension file.
3) Optional: Detect client requests for URLs with extensions, and externally redirect those to the extensionless URL. The purpose of this is to 'recover' old backlinks and user bookmarks, and to speed up the switchover to your extensionless URLs in search engine results.

So basically, you're trying to do step 3 here without doing the other two steps. This will result in your visitors having to go through the added delay of an external redirect for every extensionless page request, and complicate the search engines' job of indexing those pages.

Also, I question your use of RewriteBase, I don't think you need it here.

And further, extensionless files should not end with a slash; URLs ending with a slash indicate a directory not a file, and this will likely also cause you problems/complications with linked objects on your extensionless pages.

Here are examples of the two rules you might use to implement extensionless URLs for /info .html files, assuming you have changed the links on your pages to remove the .html extensions for URL-paths resolving to the /info directory:

RewriteEngine on
RewriteBase /
#
## Internally rewrite extensionless /info URLs to existing .html files
# If no filetype extension on requested URL
RewriteCond %{REQUEST_URI} !\.[a-z0-9]+$
# If URL plus extension exists as a file
RewriteCond %{REQUEST_FILENAME}.html -f
# Internally rewrite to file with extension
RewriteRule ^info/(.*)$ /info/$1.html [L]
#
## Externally redirect old .html-extension /info URLs to new extensionless URLs
# If direct client request for .html files
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
# Externally redirect to URL without extension
RewriteRule ^info/([^.]+)\.html$ http://www.example.com/info/$1 [R=301,L]

These are freshly-typed and untested. A known limitation is that the simple regex patterns shown here do not support URLs with periods in the directory pathnames.

The check for 'file exists with .html extension' is not strictly required for your simple application. However, I show it here in case you might like to add another file extension later. For example, if the requested URL-path does not resolve to an existing .html file, you could add another rule to check to see if it exists as a .htm or .shtml file. If you only ever plan to support one filetype, you can comment-out or delete the RewriteCond for .html file-exists checking for improved performance.

Jim

akameng




msg:3481210
 7:43 pm on Oct 18, 2007 (gmt 0)

I am sure that jdMorgan have a best solution, But I will suggest only modify this line:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
to
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\s/[^.]+\.html\sHTTP/\d\.\d$

only escaped(\ ) space to \s for easy understanding
and HTTP/1.1 or any version to: HTTP/\d\.\d$ because THE_REQUEST contain The full HTTP request line sent by the browser to the server (e.g., "
GET /index.html HTTP/1.1
"). This does not include any additional headers sent by the browser.
the method will be
(OPTIONS,GET,HEAD,POST,PUT,DELETE,TRACE,CONNECT)

jdMorgan




msg:3481335
 9:52 pm on Oct 18, 2007 (gmt 0)

Because the pattern ending in HTTP is not end-anchored, there is no need to specify anything past the end of "HTTP/".

The difference between "\ " and "\s" is largely a matter of style.

Jim

g1smd




msg:3481417
 11:44 pm on Oct 18, 2007 (gmt 0)

When I use "extensionless URLs" they are actually index files each in their own folder. The URL ends with a trailing / every time.

MrBlack




msg:3481432
 11:57 pm on Oct 18, 2007 (gmt 0)

Thanks for the help guys.

g1smd, thats a great idea too and will probably be the best solution for me. But do you need to place a htaccess in every directory to remove the index.html or have you done this with the htaccess in the root?

[edited by: MrBlack at 12:00 am (utc) on Oct. 19, 2007]

g1smd




msg:3481447
 12:09 am on Oct 19, 2007 (gmt 0)

I use the .htaccess file in the root to control everything on the whole site.

All requests for (default¦index)\.(php(4¦5)?¦html?¦cfm¦aspx?) are stripped back to the preceding "/".

All requests with parameters on the end have those stripped too.

MrBlack




msg:3481480
 12:53 am on Oct 19, 2007 (gmt 0)

Ok, this is what I have come up with to remove index.html from the urls in root and all sub directories....

RewriteEngine on
RewriteCond %{THE_REQUEST} ^GET\ /.*/index\.html\ HTTP/
RewriteRule (.*)index\.html$ /$1 [R=301,L]

Can you see any problems with it? Really appreciate your help guys!

jdMorgan




msg:3481509
 1:31 am on Oct 19, 2007 (gmt 0)

Unless you use the generic pattern I show in the code I posted above, you'll also want to include the HEAD method, as well as GET.

Jim

MrBlack




msg:3486578
 8:33 pm on Oct 24, 2007 (gmt 0)

Ok, I have now come up with the following code which removes the index.html from the url for the subdirectory and sub-subdirectory

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*/index\.html\ HTTP/
RewriteRule (.*)index\.html$ /$1 [R=301,L]

However I cannot make it work for the root index.html aswell. Any ideas where I am going wrong?

jdMorgan




msg:3486636
 9:25 pm on Oct 24, 2007 (gmt 0)


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]

Jim

MrBlack




msg:3486658
 9:44 pm on Oct 24, 2007 (gmt 0)

Thanks very much

g1smd




msg:3486668
 9:59 pm on Oct 24, 2007 (gmt 0)

So that I can slot the same code on to every website, I don't just test for index.html requests.

I test for (default¦index)\.(php(4¦5)?¦html?¦cfm¦aspx?) and all of those redirect. It also partly hides which technology the site is actually using.

MrBlack




msg:3488044
 4:44 am on Oct 26, 2007 (gmt 0)

So that I can slot the same code on to every website, I don't just test for index.html requests.

I test for (default�index)\.(php(4�5)?�html?�cfm�aspx?) and all of those redirect. It also partly hides which technology the site is actually using.

Sounds good. How would you slot this into the code that jdmorgan provided? I am particularly interested in checking for index.php as well as I am currently converting a site running on php to straight html. The php site had the urls rewritten to .html extensions apart from the root index.php page. Sorry, I am a newbie when it comes to this :)

g1smd




msg:3488252
 12:52 pm on Oct 26, 2007 (gmt 0)

Replace each: index\.html in the code with (default¦index)\.(php(4¦5)?¦html?¦cfm¦aspx?) instead.

Remember to replace the forum pipe symbols with the correct pipe symbols when you edit this code.

hailg03




msg:3614001
 7:29 am on Mar 29, 2008 (gmt 0)

How do I redirect urls with periods. This is what I am using right now and it works for urls without periods.
Example how do i get all urls similar to www.example.com/you.are.html to www.example.com/you.are

RewriteEngine on
RewriteBase /

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/
RewriteRule ^([^.]+)\.html$ /$1 [R=301,L]

Thanks

jdMorgan




msg:3614220
 4:12 pm on Mar 29, 2008 (gmt 0)

Change the rule pattern:

RewriteRule ^(.+)\.html$ /$1 [R=301,L]

The original pattern was written for pattern-matching efficiency, but specifically excludes periods anywhere in the URL-path, except preceding the filetype.

See the regular-expressions tutorial cited in our forum charter for more info.

Jim

hailg03




msg:3614266
 5:25 pm on Mar 29, 2008 (gmt 0)

Hey Jim,

I did what you said and also changed
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.html\ HTTP/

to
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+)\.html\ HTTP/

Which got the result I wanted. Let me know if this change is fine. I really don't know much about mod rewrite.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved