homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
Project File Structures - Smart URIs
Server-Side Technology Independence and Ease of Maintainability
Fotiman

WebmasterWorld Senior Member fotiman us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3404461 posted 7:00 pm on Jul 25, 2007 (gmt 0)

When creating a new web project/site/app, deciding how you'll manage the files can have a long lasting effect on the maintainability and portability of your code. For example, storing all of your files in a single folder with no partitioning might make it more difficult to know which files are being shared by multiple pages, or which files are no longer used at all. Directory structure can help.

Another benefit that can be utilized with a well partitioned project is something I like to call Project URI's Independant of Server-Side Technology or PURIST (yes, I just made that up). :) What this means, is that we can avoid directly binding our project to a specific Server-Side language (PHP, ASP, etc.) by placing each "page" in its own directory, and treating the directory as our page name. In our application we would only create links to the directories, instead of linking to individual files. For example, suppose we have the following directory structure:

http://example.com/aboutus/
http://example.com/products/
http://example.com/directions/

If our application links to the URI's shown above, then we haven't specified that we're using PHP, ASP, or even static HTML pages. This means, for example, that if our project is originally designed as static HTML pages and we later decide to convert it to PHP or ASP, we don't have to modify any of the links. Any external sites that link to our pages should also continue to work as well, and any page rank in search engines should not be affected either. We don't need to worry about setting up 301 Page Redirects on the server and we've saved ourself a ton of work just by using URI's that don't point to a specific file, but instead point to a directory (the web server will be configured to serve up a certain file as the default).

That's a good starting point. But lets not forget about all of the other client side files that get included in a page. The basics are:

  • CSS
  • JavaScript
  • images

Since we're creating our pages as directories in the root of our web application, we'll want to put our common/shared files in a location that's unlikely to conflict with a potential page URI. So perhaps we might add the following directories:

http://example.com/inc/css/
http://example.com/inc/img/
http://example.com/inc/js/
http://example.com/inc/yui/

We've only created 1 directory at the root level (inc) and it's not likely to conflict with any of our page names. Also, we kept the name short but still understandable (inc = includes). Likewise, our subdirectories are also short. Note also that I created a directory for the Yahoo UI Library [developer.yahoo.com] (YUI). You might create additional directories for other 3rd party libraries, but the point here is that we've placed it in our inc directory to keep it from conflicting with our own page names.

If you're really into efficiency and performance but want to keep your CSS and JavaScript files modularized, you might consider some sort of "build process" to combine your files. For example, suppose you want to have 3 main CSS files like so:

http://example.com/inc/css/color.css
http://example.com/inc/css/layout.css
http://example.com/inc/css/typography.css

Your build process might concatenate these files into a single file. Maybe something like this:

http://example.com/inc/css/lib/main.css

Your application would then only need to include this one file.

Likewise, you could do something similar with your JavaScript files. There are tools available (JSMin for example) that will "minify" your script files, trimming out extra whitespace and comments. That way, you could have your fully commented source JavaScript file where you do your development, and nice slim minified version that gets served up in your application.

This is just one example of how you might better manage your projects with careful planning of directories. Do you have your own method?

 

restless

5+ Year Member



 
Msg#: 3404461 posted 1:07 pm on Jul 26, 2007 (gmt 0)

You make a good point but if you want to talk scalability then I don't really want to be creating a directory for every single file on my website. For a small website with not many pages then this is ok.

For those that use .html files and think they might upgrade to .php then they should use .php anyway.

Rarely does one site change from one server side technology to another and you can actually configure apache to server up php files if an .asp file is requested, so it still looks like an .asp file in the browser location but behind the scenes a php script was executed to return the html

whoisgregg

WebmasterWorld Senior Member whoisgregg us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 1:36 pm on Jul 26, 2007 (gmt 0)

For those that use .html files and think they might upgrade to .php then they should use .php anyway.

I prefer to think of the file extension that the user receives as relating to what that file contains, not how it was made. Do GIFs/JPEGs/PNGs use .photoshop, .fireworks, .paint, or .whatever depending on which application were used to make them?

Better, I think, to preserve the extra meaning that .html has for users rather than using that space to advertise your choice of server-side platform.

You can just use .html and turn on PHP parsing for .html files [webmasterworld.com]. (That link also includes an approach to handling things if you only need a few PHP-enabled .html pages.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 2:07 pm on Jul 26, 2007 (gmt 0)

Fotiman is making a recommendation here that echos that of Sir Tim Berners-Lee, the guy who invented the World-Wide-Web.

I'd add that if you have the server-side technology, such as mod_rewrite on Apache or ISAPI Rewrite on IIS, you can dispense with the trailing slash (which indicates a directory), and instead use extensionless URLs which indicate files.

So that would make the first list:

http://example.com/aboutus
http://example.com/products
http://example.com/directions

The server need only check for whether those resources exist when .html, .php, or .asp (for example) are appended to the requested URL. So, the URLs used on the Web will have no extension, whereas the files on the server will, and you are then free to change those extensions as you please.

A point of confusion for many Webmasters not accustomed to using this method is that a URL is not a filename, and a filename is not a URL. In fact, as implied by this thread, the two need not be identical -- or even similar. Server-side URL rewriting can make the file directory and name completely independent of the URL published on your pages.

However, and more importantly, Fotiman's recommendation that you organize your files into a defined structure is a very good one. Files can be organized by function, by filetype, by access restrictions, by cache-control characteristics, and by whether you want them spidered or not, in addition to other characteristics you may choose. This level of organization, if done properly, can make a huge difference in the efficiency and long-term maintainability of your site.

Every month, we get tons of threads posted here about "Help! - I changed all my URLs and my rankings dropped." This thread shows a way to avoid those problems.

Ref: Cool URIs don't change [w3.org] by Sir Tim Berners-Lee

Jim

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 2:21 pm on Jul 26, 2007 (gmt 0)

It's a drag making a new physical folder for every page in a big site, but there are ways to "fake it" so I don't think it's such a terrible ordeal. URL rewriting is fun.

One regex pattern in your htaccess could turn
www.example.com/abc/def/ghi.html
into
www.example.com/abc/def/ghi/

Redirect (.*).html $1/

Though I dread bringing up canonicalization again, it's useful to note that these 2 URLs will show the same content:
http://example.com/aboutus/
http://example.com/aboutus/index.html

avoid canonical issues by applying a global rule that always removes the default file name from any requested URLs. auto-Canonicalization.

Redirect (.*)/index\.html $1/

You'll need rules in place to handle all the various permutations of default subdomains and file names that may produce uncanonical URLS.
Here's the full set for that example:

http://www.example.com/aboutus
http://www.example.com/aboutus/
http://www.example.com/aboutus/index.html
http://example.com/aboutus
http://example.com/aboutus/
http://example.com/aboutus/index.html

add querystrings and you add an infinity of other variations.

I like to keep my file types segregated. One folder for images, one JS, another another XSLT. Globally reused images go in a root "/images" or "/img" or "/i". Occasionally I'll nest a pocket of images in a subfolder like "/aboutus/img" when I am 100% certain that they'll never be used by any other pages, for instance in a gallery script or a special app.

I agree that it's tidy to keep your URLs platform-agnostic by mapping the .html extension to your parser of choice, but in heavy stress/load situations it may be advantageous to keep parsed PHP and non-parsed HTML separate, so your simple flat HTML pages that don't require any PHP love don't get the parser revved up for nothing. Why tease your PHP engine with static HTML?

another tip I offer: don't use a known extension for your included server script files or SSI snippets. If they're named "header.php" or "navmenu.txt", they can be requested individually via URL (http://www.example.com/navmenu.txt) and you don't want that... they could be found, linked to, and indexed. Do you care? Instead, name them "header.inc" or "navmenu.ssi" and make sure your server blocks requests for those extensions from the public.

vincevincevince

WebmasterWorld Senior Member vincevincevince us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 2:36 pm on Jul 26, 2007 (gmt 0)

I've been doing this for some time - using mod_rewrite functionality. All recent sites have even had a configuration option 'file extension'. I've been matching some to brand names, so a widget site might be all '.widget'. If happily served as text/html no modern browser will baulk.

Note that's not for SEO purposes but purely for fun or branding purposes.

The other thing I've been doing is having all modules in one modules directory, but giving them all pseudo-URIs as if they had their own directory structure, and the calls within them (normally?action=print , etc.) become files of their own (modulename/print.widget) - easily configurable to give a new 'modulename' if someone prefers to call their shop 'store' rather than 'ecom'.

jtara

WebmasterWorld Senior Member jtara us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3404461 posted 2:43 pm on Jul 26, 2007 (gmt 0)

Note that URIs really don't have a concept of "directory". That's just an artifact of the default mapping between URIs and the local filesystem on most (or all) web servers.

If you find that confusing or are unconvinced, think about this: a URI "directory" can itself contain content - something impossible in every file system I am familiar with! That is, you can have:

example.com/products
example.com/products/
example.com/products/widgets/
example.com/products/widgets/purple-widget

all returning HTML content.

It's better not to think of it as a "directory" but simply as a "URI part". Here's what the actual specification says:

Generally, the reserved slash "/" character (ASCII 2F hex) denotes a level in a hierarchical structure, the higher level part to the left of the slash.

Now, while filesystem directories are part of a hierarchical structure, the specification doesn't require that the parts to the left of the terminal part should represent *directories*. Just "levels of a hierarchical structure". There's no requirement that the levels be represented physically on a disk somewhere.

So, you won't be needing to go around "creating directories" unless there are actually more parts to the right of a given slash.

BTW, dropping the suffix is the default for Rails applications.

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 2:54 pm on Jul 26, 2007 (gmt 0)

You don't have to create the directories. This is done with using htaccess, asapi, or your server side language.

Fotiman

WebmasterWorld Senior Member fotiman us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3404461 posted 3:10 pm on Jul 26, 2007 (gmt 0)

I'm glad to see so much interest.

There has been a lot of talk about mod_rewrite or configuring your server to pass .html pages through your PHP or ASP processor. But there are some drawbacks to that method as well:

  1. In a hosted environment, you may not have access to that type of configuration.
  2. For some developers, configuring the web server may be outside of their capabilities, or just something they don't feel comfortable doing.
  3. Configuring your server to pass .html pages through a PHP or ASP processor will require your server to do more work when it might not need to (if the pages actually contain just static HTML). This could impact performance.

Obviously, there are more than one ways to skin a cat, and I'm not trying to say that more server configuration is the wrong way. But with the approach I was suggesting, there would be no special server configuration required.

It was also pointed out that it's not common to switch from one server side technology to another. I generally agree, but there are exceptions. For example, a small business owner had a site developed in ASP and now wants a whole new site redesign. The developer hired is proficient in PHP and wants to use a lot of the free PHP based products available on the web, or maybe add a WordPress blog to the site (WordPress is PHP based for those who don't know). If the site was created with a clean directory structure (as in my first post), then it would be very easy for the developer to copy the entire site from an IIS server to an Apache server and then replace all of the ASP files with PHP files. No additional server tweaking required, and the existing URIs remain intact.

Just some food for thought.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 6:26 pm on Jul 26, 2007 (gmt 0)

I'm glad to see so much interest.

Always! Smart URIs. File Naming Conventions. Always a hot topic of discussion.

Over the years I've found myself refining URI structures as I learn more and more. Reading the RFCs and W3 guidelines give you a deeper understanding of URI naming conventions and why you should do it one way and not the other.

Planning of Smart URIs begins at the conceptual design phase. At that time, the taxonomy of the site is being addressed at which time the URI naming conventions should also be addressed.

If there are "other parties" involved in the design process, they should be aware of the strict naming conventions in place and you should provide them with guidance on their naming conventions. Its a real pain in the arse to have to change things after the fact. Do it right the first time around and thats one less thing to worry about in the future.

Based on my experience in working with websites, developing architectures, etc. I now treat each and every site the same. No files at the root other than those that are necessary (index, robots, ini, etc). Everything else is in its respective sub-directory, everything.

I'm one of those who feel that there is a place for everything. Finding and remembering that place is imperative. Smart URIs are definitely a key element in that finding and remembering routine. From a site management standpoint, Smart URIs are mandatory.

I have sites that contain over 100 top level sub-directories and then 100's/1,000's more under that. They've grown over the years. The tools that I use on a daily basis allow me to easily maintain a directory structure of that size. I could easily have 1,000 sub-directories. In some instances I do, but those are virtual sub-directories and are generated based on the query. ;)

I treat each category as a site in itself. It will contain most of the same static folders that other categories will contain. You know, stuff like...

/category/css/
/category/images/
/category/js/
/category/nav/

I just think it makes it much easier to segment and transport when everything is where it should be. But, I guess that is all based on the developers perspective. If everyone followed the RFCs, we wouldn't be having this discussion. ;)

carguy84

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3404461 posted 7:06 pm on Jul 26, 2007 (gmt 0)

That's a lot of folders :¦

My website structures look like this (always):

/
/images/
- all images, sometimes sub folders depending on the complexity of the site
/includes/
- any JS/CSS/.master files go in here, no sub folders
and then I will have folders to break up the different sections of the site:
/company/
aboutus.aspx
advertising.aspx
whatever.aspx
/#*$!XX/
/YYYYYY/

Anything dynamica I use ISAPI Rewrite for and dont include a file extension. Makes future proofing very easy.

Chip-

StupidScript

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3404461 posted 8:22 pm on Jul 26, 2007 (gmt 0)

3. Configuring your server to pass .html pages through a PHP or ASP processor will require your server to do more work when it might not need to (if the pages actually contain just static HTML). This could impact performance.

This is a negligible issue. If you are using PHP or ASP in your development, then most or all of your pages are being parsed by the pre-processor, anyway. If there is nothing to compile in a file, it exits the pre-processor nearly instantaneously. (I can't speak to ASP, but PHP works like that.)

Even with large, high-traffic sites, including .html files in PHP pre-processing will have no noticeable impact on performance.

Fotiman

WebmasterWorld Senior Member fotiman us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3404461 posted 8:46 pm on Jul 26, 2007 (gmt 0)



3. Configuring your server to pass .html pages through a PHP or ASP processor will require your server to do more work when it might not need to (if the pages actually contain just static HTML). This could impact performance.

This is a negligible issue. If you are using PHP or ASP in your development, then most or all of your pages are being parsed by the pre-processor, anyway. If there is nothing to compile in a file, it exits the pre-processor nearly instantaneously. (I can't speak to ASP, but PHP works like that.)

Even with large, high-traffic sites, including .html files in PHP pre-processing will have no noticeable impact on performance.

I agree, most of the time this would be negligible. That's why I wrote it *could* impact performance. It would probably only affect performance if you configured it for a server that was hosting multiple sites and you made that a global change that applied to all of the sites.

But in any case, it's still less efficient to run it through that pre-processor when you don't need to.

londrum

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3404461 posted 8:52 pm on Jul 26, 2007 (gmt 0)

and using www.example.com/widgets
instead of www.example.com/widgets.html
might be more important in the future once microformats start taking off.

search engines like technorati rank pages by their 'tags' which is basically just the last part of the url (like /widgets). but it doesn't work if it's a file. it has to look like a directory - with no trailing slash and no file extension.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 10:25 pm on Jul 26, 2007 (gmt 0)

If you have material that has a publish date, then present the web URLs as folders in numerical /year/month/day/ order with a full four-digit year, and a leading zero for months and days 01 to 09 - irrespective of what the internal filepath is.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 10:41 pm on Jul 26, 2007 (gmt 0)

> but it doesn't work if it's a file. it has to look like a directory - with no trailing slash and no file extension.

Not to further confuse the URL-versus-filename discussion (in which I often take part), but if you look at the HTTP specs, in simple terms a URL with a trailing slash should resolve to a directory or to a directory index page, and anything without a trailing slash indicates a discrete resource; It should resolve to a "file" or to a script that generates a "page".

I'm taking issue specifically with the "look like a directory - with no trailing slash" part above. Any final URL path-part without a trailing slash --whether there is a file extension or not-- is not a "directory-path-part."

We now return you to your regularly-scheduled programming.

Jim

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 3:28 am on Jul 27, 2007 (gmt 0)

I (and most people) conceive of URLs like they're file systems... that's why "http://www.example.com/abc/" looks like a folder, and "http://www.example.com/abc" looks like a file with no extension.

as already stated, there's no need to stick to that convention.
I like the unorthodox stuff v.v.v mentions. keyword-stuffed extension? I think I may try that somewhere :)

you need the root "/" after the domain TLD, but to the right of that, you could organize a site using dots:

http://www.example.com/abc
http://www.example.com/abc.bcd.e
http://www.example.com/abc.bcd.f
http://www.example.com/abc.bcd.h
http://www.example.com/abc.ijk
http://www.example.com/abc.ijk.l

or, as evident at JSPON.org, you can name your pages using hashes:
http://www.example.com/#abc
http://www.example.com/#def
http://www.example.com/#ghi
http://www.example.com/#jkl

or you could combine them all willy nilly:

http://www.example.com/#a.b.c
http://www.example.com/?a#b.c
http://www.example.com/a.b?c
http://www.example.com/a?b#c

One site I've worked on put system and SQL DAL commands after the dot, like
http://www.example.com/.delete?id=3
http://www.example.com/.update?id=3&val=abc
http://www.example.com/.insert?val=abc

I was impressed by the novelty the first time I saw del.icio.us; then script.aculo.us... then not so impressed by the hundred more that followed. schoolb.us? gimme a break.

how creative can you get with unusual URI naming? there are limits...

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 3:32 pm on Jul 27, 2007 (gmt 0)

or, as evident at JSPON.org, you can name your pages using hashes:
http://www.example.com/#abc
http://www.example.com/#def
http://www.example.com/#ghi
http://www.example.com/#jkl

Use of the Fragment Identifier (pound sign, hash symbol, crosshatch), is a client side function.

The bot stops here #

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3404461 posted 1:01 pm on Jul 31, 2007 (gmt 0)

Use of the Fragment Identifier (pound sign, hash symbol, crosshatch), is a client side function.

LOL - yes, and when creating an AJAX or Flash app that saves its state, it's a handy little fragment to manipulate. though you can bounce "#stuff" off the server with AJAX, constructing URLs with it is slippery.

AFAIR the #-part of the URL doesn't prompt a reaction from the server; server sends the whole pre-# page then allows the browser handle the #. Which means any URL scheme involving #'s had better be heavily controlled client-side

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved