Forum Moderators: phranque

Message Too Old, No Replies

Hiding page extensions

mypage.html -to be- mypage (no ext.)

         

Tastatura

4:16 am on Feb 19, 2006 (gmt 0)

10+ Year Member



Hi all,
I am trying to do the following (however I am completely confused at this point):
I would like that my page extensions are not shown i.e. if currently pages shows as:
mydomain.com/subdomain/page.html

I would like it to show as

mydomain.com/subdomain/page

I am on virtual hosting running Apache 1.3. I read some documentation about redirects and maps, as well as few posts on WM, however I am still not clear how to do this (perhaps information overload). If there is post that already answered this question please point me to it as I wasn’t able to find it.
Thanks

jdMorgan

4:41 am on Feb 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here:

First and foremost: Mod_rewrite cannot "hide" or change your page names (URLs). It can only change the filenames that Apache associates with the page names you link to on your pages. It's an important distinction.

Your first step is to change all the links on your pages to point to extension-less pages. (I strongly suggest you make a back-up copy of your site before starting.) Now that your links all point to extensionless URLs, those are the URLs that people will see, and that search engines will ist and spider.

Now you need to tell your server to associate those URLs with the real filenames on your server, and those filenames do (and must in most cases) have file extensions (to support correct MIME-type handling).

You don't have to use mod_rewrite to do this. Just turn on Content-Negotiation by adding


Options +MultiViews -Indexes

to your config file or to your .htaccess file in your top-level Web-accessible directory.

Be aware, though, that this is a feature that is not without drawbacks. For example, you won't be able to use "directory indexes" on your site any more. I mean the use of Apache-generated directory-listings of all files in each directory of your site -- Most modern sites don't use them, except those that provide a lot of files for users to download directly from one or more directories.

There is also a measurable server performance impact, since you are now asking your server to find a file or directory that matches the request most closely.

You can use mod_rewrite to do this "manually" if you so desire. This also has a performance impact, but perhaps a bit less than content-negotiation. A simple non-all-inclusive example would be:


RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(([^./]+)+)/?$ /$1.php [L]

That will internally rewrite requests for any page URL that does not contain a period and does not exist as either a filename or directory, to the same URL with ".php" appended. You may want to add some exclusions to it, though, as your needs dictate.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com]. Also, check out the content-negotiation documentation on the Apache site.

Jim

Tastatura

11:17 am on Feb 19, 2006 (gmt 0)

10+ Year Member



Hi Jim,
Thanks for the info and very comprehensive reply! I already read Apache mod_rewrite guide, however I haven’t read this forum’s charter (good Expressions Tutorial in there) prior to posting – I guess I should of. During my troubleshooting I did realize that I can turn “content navigation” on and be done with it, however I wanted to do it via mod-rewrite just because it was giving me hard time and I wanted to figure out what I am doing wrong. My original rewrite rule looked similar to your example, however checking for “.” is clever and I haven’t thought of that. The /$1.php [L] was the same (well I was using .html) and it turned out that the problem was in this part of the rule. After spending couple of hours of re-reading documentation and testing stuff, I got it to work, and re-write rule looks like this:

RewriteRule ^(([^./]+)+)/?$ $1.html [L]

Note that forward slash before $1.html is missing. With fwd slash infront of it I was getting 404 message. I am still not sure why is this – my best guess is that has to do with parsing, but I’ll still have to think and figure out why it was ‘mis-parsing’ (if that was the reason anyway).

Speaking of ‘extension-less’ links, I am under the impression that it might be beneficial to set up site/links in that manner – especially if in future I want to change how my site delivers content (currently it’s bunch of include.php files). Is this true? What are drawbacks (except additional load on server)?
I already disabled dir indexing prior to tackling this current issue, as I don’t want that feature.

I am still building the site so changing the links to “extension-less” will not be an issue – actually none of the real links are up, just few test links (I disallowed all robots, and there aren’t any links on the home page, so nothing should be spidered until I am ready to go live). I am finalizing organizational and structural aspects of the site and afterwards I’ll add content.

Thanks again.

jdMorgan

3:12 pm on Feb 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, actually, I messed up the original rule, which should have been:

RewriteRule ^([^/]+/)*([^./]+)/?$ /$2.html [L]

The intent is to remove the path info preceding the 'filename', and then add an extension to the filename. But that shouldn't make too much difference, since you'll want to adjust that pattern to suit your needs anyway - especially if your pages are actually static html pages. For example,

RewriteRule ^(([^/]+/)*[^./]+)/?$ /$1.html [L]

would retain the full path.

The leading slash on the substitution 'roots' it to the DocumentRoot of the domain. If your files are actually in a subdirectory below DocumentRoot, then you'd have to take this into account if using a leading "/".

Jim

jdMorgan

3:15 pm on Feb 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The main problem with extensionless URLs is practical -- It's hard to tell whether you are looking at a "filename" URL or a mal-formed "directory" URL. This is true to a large part because some search engines often change any extensionless URL they find by adding a trailing slash. It's annoying because it can cause unexpected results if you're not aware of it -- Note that both my patterns end in "/?$" for this reason.

Jim

Tastatura

7:54 pm on Feb 19, 2006 (gmt 0)

10+ Year Member



Thanks a lot. Really appreciate all the help and info.