homepage Welcome to WebmasterWorld Guest from 54.243.13.30
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

    
Conceptual fog: Example.com/directory/directory/file?
When is it a directory? When is it a file? What is the distinction?
Webwork




msg:353510
 4:47 pm on Mar 31, 2006 (gmt 0)

In structuring a website:

When is example.com/Wisconsin - a directory

When is Example.com/Wisconsin - a file

In Example.com/Wisconsin/Madison - is Madison a new directory? If not then could it be, that is, what would make it "become" a directory?

What if it was Example.com/Madison.htm?

My brain is full of fog on this rather simple issue: When is a file a file (page) and can a "directory file" be a page AND can it be a page AND a directory?

If that doesn't make sense then everything is perfect, because it's not making sense to me.

What makes a directory a directory in a websites file hierarchy?

Is there any conceptual difference between a 'nix system and Windows? Different nomenclature?

It's not that I've never built a website before. It's that I have a large brick inside my skull when it comes to what appears to me to be a simple matter.

 

stapel




msg:353511
 7:28 pm on Mar 31, 2006 (gmt 0)

"Files" are actual files, like HTML pages, PHP scripts, and GIF graphics.

"Directories" are folders, into which one may place files.

It is unlikely that "example.com/Wisconsin" would be a "file", since a computer would have no way of knowing how to handle "Wisconsin" (there being no file extension, such as .htm, .php, or .gif).

Where did you run across the term "directory file"?

Eliz.

LifeinAsia




msg:353512
 7:49 pm on Mar 31, 2006 (gmt 0)

It is unlikely that "example.com/Wisconsin" would be a "file", since a computer would have no way of knowing how to handle "Wisconsin" (there being no file extension, such as .htm, .php, or .gif).

Very easy to setup mod rewite (or similar Windows versions) to "map" example.com/Wisconsin to example.com/Wisconsin.htm or example.com/Wisconsin.gif or example.com/state.php?state=Wisconsin or any number of other "real" files.

BertieB




msg:353513
 7:57 pm on Mar 31, 2006 (gmt 0)

Short answer: they can be either, depending on the setup

Longer answer:

In practice, example.com/foo/bar could return either the directory bar inside the directory foo; or there could be a file bar inside the directory foo.

I think I recall an older discssion on this board from either a year or two back about this. As I remember, someone posted indicating that example.com/foo/bar should always refer to a file, and example.com/foo/bar/ should always refer to a directory. It is a workaround or kluge of Apache and others that has become standard behaviour to return a directory if the file can't be found. Perhaps someone else remembers the thread in question?

Apache's mod_dir [httpd.apache.org] documention hints at this, but I can't find anything conclusive.

A "trailing slash" redirect is issued when the server receives a request for a URL [servername...] where dirname is a directory. Directories require a trailing slash, so mod_dir issues a redirect to [servername...]

--------------------

And as a side note, you can't infer that 'bar' (or anything else) is a directory for lack of a file extension. While it would be rarer on Windows to have a file without an extension, it is common to see on Unix variants. Also, both Windows and Unix can have directories with 'extensions', eg: ~/foo.bar/baz/ or C:\foo.bar\baz\.

encyclo




msg:353514
 8:00 pm on Mar 31, 2006 (gmt 0)

When the URLs are referencing real files, a directory always has a trailing slash:

example.com/Wisconsin/

The underlying server (Apache, IIS), serves the default index file, often index.html, index.htm, default.htm...

When you see an URL without a trailing slash:

example.com/Wisconsin

Then "Wisconsin" is usually an extensionless file or a file served via content negotiation, not a directory.

Of course the URL does not necessarily need to reflect the underlying file structure - both mod_rewrite and content negotiation [httpd.apache.org] are examples where the concept of files and directories lose some relevance.

bedlam




msg:353515
 8:00 pm on Mar 31, 2006 (gmt 0)

When is example.com/Wisconsin - a directory

When is Example.com/Wisconsin - a file

Well, you might see a url like example.com/wisconsin/, but--at least where static files are concerned--that url will, in most cases, actually lead to a file with a name like example.com/wisconsin/index.html.

The situation can become more complex on dynamic (i.e. database-driven) sites where there are no actual directories in the file system but where all the urls take a form like example.com/wisconsin/. This is sometimes done to hide the technology [webmasterworld.com] behind the website. In other words, it's done to make the type of file actually being reached by the url inconsequential--if there's no .php .asp .aspx .foo or .bar in the url, the site may change technologies completely at some point, but the urls will stay the same...

-b

Webwork




msg:353516
 12:07 am on Apr 1, 2006 (gmt 0)

1.
Directories" are folders, into which one may place files.

Okay, I think I got that.

2.
a computer would have no way of knowing how to handle "Wisconsin" (there being no file extension

Okay, it's starting to get away from me.

3.
As I remember, someone posted indicating that example.com/foo/bar should always refer to a file, and example.com/foo/bar/ should always refer to a directory.

Hmmmm, so no traling slash = file; trailing slash = directory? Think I got it but it's slippery.

4.
Well, you might see a url like example.com/wisconsin/, but--at least where static files are concerned--that url will, in most cases, actually lead to a file with a name like example.com/wisconsin/index.html.

Argh!

Alrighty then, so

a. A directory holds files
b. In a url, if it ends with a trailing / then it's a directory?
c. But if it ends with a trailing slash the O/S will interpret to be /index.htm?

Am I getting it or getting closer?

:) or :(?

Thank you all very much for trying to break up my mental log jam.

LifeinAsia




msg:353517
 12:19 am on Apr 1, 2006 (gmt 0)

a. A directory holds files
b. In a url, if it ends with a trailing / then it's a directory?
c. But if it ends with a trailing slash the O/S will interpret to be /index.htm?

a. You betcha!
b. Usually, but see my previous post about mod rewrite.
c. Well, it will if the web server has been setup that way. You can define any file you want for the default file. With IIS you can define multiple default documents- not sure about Apache.

2by4




msg:353518
 12:27 am on Apr 1, 2006 (gmt 0)

webwork: in the world of unix, which is the syntax of the web from the beginning, all things are files.

So, technically speaking, a file is a file, a directory is another type of file, and so on. Everything in the box is some type of file.

Conceptually speaking, the directory is a file that contains as its contents a list of other files and/or directories. In unix, all files are contained in directories, even if that directory is just /, but it never is realistically. So you have a file that is of type directory, which contains pointers to files of either type file or type directory, more or less anyway.

This may be why you got confused, this is actually how linux/unix works, and it can be confusing until you realize that everything is in fact a file, but there are different types of files. This is, I believe, whether it's windows or unix type systems, how your data is actually arranged, it's why you can list for example only files of type directory, or all files, and so on. To make it even more confusing, there are also files . and .., which enable you to navigate through the system. For example, when you do this: ../, you are actually telling the file system to move up one level, to the containing directory, to the file ..

And when you list all the contents of any directory in unix/linux, you will in fact see those two extra little files, . and .. [or does the .. in the current directory actually contain the pointer to the parent directory?... mysterious, that might be it though..] - nope, checked, in fact, .. is a file of type directory, not a file of type file... same for . So .. must be a file that is a directory that is the parent directory, while . is a pointer that is a directory that is the current directory.

How can you not love unix?

This is how I understand it, more or less, might not be totally right, but that's about how it works.

Webwork




msg:353519
 2:35 pm on Apr 1, 2006 (gmt 0)

Questions around the same topic:

1. This issue arises in the context of me considering a) website taxonony (organizational structure, topic categorization, page/file names, etc.); b) SEO, or as I prefer, SEA - for search engine assistance, as I think one should optimize for humans and assist search bots.

In the context of SEA does a search engine "see a directory" any differently than it "sees a page"? In other words, does it "see a difference and handle things differently (in the context of processing an ENTIRE website, when the website includes:

  • Example.com/Chicago/Plumbers/
  • Example.com/Chicago/Plumbers.htm
  • Example.com/Chicago/Plumbers

2. Does it make any sense to think of files/directories as elements of (or in) a:

  • Presentational structure (here's a page, here's another page)
  • Organization structure (You don't see it but can infer that all these pages/files reside within XDirectory)

3. I think part of my hang-up arises from the fact that I started out using MS Frontpage, where you have the option of converting a directory into a sub-web - which may have boggled my mind just a bit more. Indeed, I can vaguely trace some of my confusion to a thread where PageOneResults was engaged in a thread/debate where the issue of "make it a directory vs. make it a sub-web" was the topic. The discussion centered around the benefits of structuring a website using directories versus using sub-webs.

I guess the sub-web is simply a web, with its own structure, its own directories, etc.

Well, I'm going to re-read what everyone has posted and look at what I just now typed and sit back and slowly digest it. Hopefully this time the nutritional elements will be able to cross the blood-brain barrier. :)

Thanks again for the education . . I think . . I hope . . I may have learned the lesson . . I dunno, I'll let you know shortly. :0)

pageoneresults




msg:353520
 5:42 pm on Apr 1, 2006 (gmt 0)

Interesting topic and one that confused the heck out of me too way back when. And, working with FP and Sub-Webs even makes it more confusing, eh?

Let's see if I can interject anything here that hasn't already been touched on. I'll use your three examples as a starting point...

1. www.example.com/chicago/plumbers/
2. www.example.com/chicago/plumbers.htm
3. www.example.com/chicago/plumbers

Example 1 is considered a sub-directory (or it could be a sub-web in FP).

Example 2 is a file which of course is indicated by the presense of the page extension (.htm).

Example 3 is a little tricky! It could be any of the three depending on how the server is configured.

3. www.example.com/chicago/plumbers

If someone were to type the above address in their browser and the page resolves without adding a trailing forward slash, one of two things may be happening. They could be doing Content Negotiation which hides the underlying technology of the site by stripping all page extensions.

Or, they have a malformed rewrite rule that is not appending a trailing forward slash. I've seen this cause problems in the past with search engines indexing both the /plumbers and /plumbers/ and even /plumbers/index.htm. Duplicate content issues may arise.

In the case of Content Negotiation, you can have different content that resides at /plumbers. Then you have other content at /plumbers/. The W3C and Google both use Content Negotiation.

In FP, don't let sub-webs confuse you. They really have nothing to do with the above and are a FP thing. When you set up a sub-web, you are literally isolating that web from the rest of the website. Any and all supporting files for the sub-web must reside in the sub-web itself or they won't work. It's FP's way of allowing you to take a site and break it down into it's least common denominator. It also comes into play when using sub-domains.

References

So, in summary, anything that resides at /plumbers/ is considered a sub-directory. It's that trailing forward slash that is the telltale sign.

dataguy




msg:353521
 1:33 pm on Apr 3, 2006 (gmt 0)

And here I thought I was the only one having an issue with this.

I think we can safely agree that 'www.example.com/chicago/plumbers/' is a directory and 'www.example.com/chicago/plumbers.htm is a file.

The issue is what to do with 'www.example.com/chicago/plumbers'.

I beleive there is at least one organization now recommending file names of web pages have no extensions at all, so if their is no trailing slash then it references a file, every time. A Google engineer told me this, but I don't remember which organization was quoted as it's not in my notes.

My problem is that I have a directory web site with about a million listings and for some reason when I created the initial stucture years ago I designed it to remove the trailing slash from all URL's which contained them. I haven't seen anyone using the extensionless format yet, at least not that I've noticed, but it would be a real problem for me if it ever starts to get popular.

Kirby




msg:353522
 2:32 pm on Apr 3, 2006 (gmt 0)

www.example.com/chicago/plumbers/

There still has to be an index.htm or whatever extension in the plumbers directory though, correct?

pageoneresults




msg:353523
 2:39 pm on Apr 3, 2006 (gmt 0)

There still has to be an index.htm or whatever extension in the plumbers directory though, correct?

No, not necessarily. In a dynamic environment, that page doesn't exist until the query is generated.

In a static environment, yes, a root level page must be present. It can be named anything with index and default being the two most popular. You can configure your server to recognize any name for a root level page. Heck, it could be your first name if you wanted it to be. ;)

www.example.com/plumbers/firstname.htm

I should point out that you should never link directly to an absolute path for root level pages. For example, this...

www.example.com/index.htm

Should be trimmed back to this...

www.example.com/

There is no need for an absolute URI like that for root level pages. Also, if you ever need to come back and change technology, you don't want those absolute URIs (for root level pages) being indexed.

jdMorgan




msg:353524
 3:02 pm on Apr 3, 2006 (gmt 0)

> I beleive there is at least one organization now recommending file names of web pages have no extensions at all, so if their is no trailing slash then it references a file, every time. A Google engineer told me this, but I don't remember which organization was quoted as it's not in my notes.

This was the W3C. The first recommendation for the use of extensionless URLs that I know of was by Tim Berners-Lee, in his paper Cool URLs don't change [w3.org].

> There still has to be an index.htm or whatever extension in the plumbers directory though, correct?

Yes, at least on most servers; The file extension is needed so that the server can determine what MIME-type header to return to the client when that resource is requested.

One point that may help is to remember that www.example.com/Wisconsin/Madison is neither a directory nor a file -- It is simply a URL. That's why there is no absolute answer to your question. For example, what if the site is dynamic? In that case, there exists neither a file nor a directory with the contents requested by that URL; All content is generated dynamically, and so does not exist before the request is made or after it is completed.

So it's (sometimes) helpful to remember that point, and to realize that URLs and filepaths are two completely-different systems of specifying a resource -- a page or "directory index" on the Web, or the corresponding file or page-generation script inside the server. The main job of a server, taken in this context, is to translate URLs to filepaths.

Many folks don't really understand this until they've used mod_rewrite or ISAPI Rewrite, and come to realize that a URL need not share anything in common with a filepath. You can easily set up a server to return the contents of any file or to invoke any script, neither of which are explicitly named in the URL sent by the client.

So, while the conventional and preferred designations of 'file' and 'directory' URLs described here are correct, the real answer is, "You can do it any way you like as far as servers and browsers are concerned."

Note, however, that in order to maintain a local copy of a site with extensionless URLs, you will need to have a local server to translate extensionless URLs to filepaths with extensions; Otherwise, the local operating system won't know what program to use to open your local files. This problem is likely the main reason that most Webmasters don't use extensionless URLs.

Jim

SilverLining




msg:353525
 3:17 pm on Apr 3, 2006 (gmt 0)

So if you have three plumbers, is this correct (for SEO/SEA purposes):

www.example.com/plumbers/johndoe/(index.htm)
www.example.com/plumbers/jimmorrison/
www.example.com/plumbers/johnblack/

and what happens to indexes pointing to

www.example.com/plumbers/jimmorrison.htm and
www.example.com/plumbers/johnblack.htm

in the SERPS?

pageoneresults




msg:353526
 3:30 pm on Apr 3, 2006 (gmt 0)

So if you have three plumbers, is this correct (for SEO/SEA purposes):

I would suggest not looking at it that way. I would look at it more from a usability standpoint and site management.

And what happens to indexes pointing to

www.example.com/plumbers/jimmorrison.htm and
www.example.com/plumbers/johnblack.htm

They will return a 404 unless of course a 301 has been implemented to permanently redirect the old to the new.

www.example.com/plumbers/johndoe/(index.htm)
www.example.com/plumbers/jimmorrison/
www.example.com/plumbers/johnblack/

I'd separate the firstname lastname with a hyphen...

www.example.com/plumbers/john-doe/(index.htm)
www.example.com/plumbers/jim-morrison/
www.example.com/plumbers/john-black/

twist




msg:353527
 3:32 pm on Apr 3, 2006 (gmt 0)

This is the contents of my public html folder,

images/
file.css
file.php
favicon.ico
feed.xml
robots.txt
sitemap.xml.gz

The contents of file.php,

<? include( '...path_below_root/build.php' );?>

I only have one php file and no folders above root (except the images folder).

So when you visit "example.com/widget/info" on my site,

Is it a file or folder? (rhetorical question, its neither)

[edited by: twist at 3:34 pm (utc) on April 3, 2006]

pageoneresults




msg:353528
 3:33 pm on Apr 3, 2006 (gmt 0)

Otherwise, the local operating system won't know what program to use to open your local files. This problem is likely the main reason that most Webmasters don't use extensionless URLs.

jdMorgan brings up a very important point in the Content Negotiation option. I experimented with this a couple of years ago and realized that it just wasn't efficient for me working in a WYSIWYG environment. You literally have to go through your web and remove all file extensions from the URI paths. It will break everything within that web unless of course you've followed jd's advice above.

pageoneresults




msg:353529
 3:37 pm on Apr 3, 2006 (gmt 0)

So when you visit "example.com/widget/info" on my site,

Is it a file or folder? (rhetorical question, its neither)

It's a folder (sub-directory) unless your server has been instructed to do otherwise. Most likely your server is going to append a trailing forward slash to that path and treat it like a sub-directory. If you have Content Negotiation in place, it will be treated as you've specified in the configuration.

Also, when using rewrite routines, it can be treated in numerous ways. It gets very tricky and technical when you decide to do something like this. Personally, it's not worth the maintenance and/or technical knowledge needed to make sure everything is perfect. One mistake in this area and you are sure to create an absolute mess that may take months, even over a year to undo.

P.S. If you are not performing Content Negotiation, I would strongly recommend that you double check and make sure the server is appending that trailing forward slash for sub-directories. If it is not, you may have some issues to address immediately. You surely don't want content being indexed as /widget/info and /widget/info/ which will happen if a rewrite is not configured properly.

pageoneresults




msg:353530
 3:53 pm on Apr 3, 2006 (gmt 0)

Scheme = foo://
Authority = example.com
Path = /sub/file.htm
Query =?name=foo
Fragment = #foo

The above is a short explanation of what a URI contains. The above can be translated to...

http://example.com/sub/file.htm?name=foo#foo

More information here...

[gbiv.com...]

Understanding the structure of a URI is first and foremost.

[edited by: trillianjedi at 9:08 pm (utc) on April 3, 2006]

Kirby




msg:353531
 4:17 pm on Apr 3, 2006 (gmt 0)

Thanks for the clarification, P1R. I did know that as I'm switching one site from static to .NET and set up everything as example.com/set/subset/ with an index.html page in every directory. Now with the ISAPI and .NET, the index.html will just fade away.

One thing I did learn was that you need to make sure you dont link to the index.html page if you also link to the /directory/ page. If you do, Google nukes you for dupe content.

twist




msg:353532
 4:39 pm on Apr 3, 2006 (gmt 0)

You surely don't want content being indexed as /widget/info and /widget/info/ which will happen if a rewrite is not configured properly

Example URL's

example.com/widget/red/happy
example.com/widget/blue/stupendous
example.com/sprocket/green/sad
example.com/sprocket/green/devastating

mod-rewrite

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\?.*$
RewriteRule ^.*$ file.php?v1=error [L]
RewriteRule ^(widget存procket)/(red在lue夙reen)/([a-z]{1,11})$ /file.php?v1=$1&v2=$2&v3=$3 [L]
RewriteRule ^(widget存procket)/(red在lue夙reen)$ /file.php?v1=$1&v2=$2 [L]
RewriteRule ^(widget存procket)$ /file.php?v1=$1 [L]
RewriteRule ^$ /file.php?v1=default [L]

Using php, check the "v3" variable against the database to make sure it has a match. No match, send them a 404.

Those 6 lines give the possibility of thousands of webpages, with little possibility for mistake. Anything that doesn't match the above criteria will throw out a 404.

All of the following would produce 404's,

example.com/index
example.com/index.htm
example.com/widget/
example.com/widget/index.htm
example.com/widget/red/
example.com/widget/red/happy/
example.com/widget?v1=red
example.com/widget?v1=red&v2=happy

jdMorgan




msg:353533
 5:09 pm on Apr 3, 2006 (gmt 0)

Touching on SilverLining's and Twist's posts above, if you've replaced the dynamic URLs on your site with static ones, you can 'recover' the PR and traffic from the old dynamic URLs and speed their replacement in the SERPs by permanently redirecting them to the new static URLs. There's no need to 404 or 410 the old dynamic URLs.

The mod_rewrite code needed to do this without creating an 'infinite loop' where the static->dynamic rewrite and the dynamic->static redirect interfere with each other is a bit tricky:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /file\.php\?v1=(widget存procket)&v2={red在lue夙reen}&v3=[a-z]+\ HTTP/
RewriteCond ^file\.php$ http://www.example.com/%1/%2/%3? [R=301,L]
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /file\.php\?v1=(widget存procket)&v2={red在lue夙reen}\ HTTP/
RewriteCond ^file\.php$ http://www.example.com/%1/%2? [R=301,L]
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /file\.php\?v1=(widget存procket)\ HTTP/
RewriteCond ^file\.php$ http://www.example.com/%1? [R=301,L]

Use of the %{THE_REQUEST} variable, which contains the original client request, is necessary to prevent the deadloop.

This is for Apache only, coded here for use in .htaccess, and we assume that the old dynamic URL contains all the information needed to build the new static URL. I'm not sure there's a direct IIS/ISAPI Rewrite equivalent, but on either server you could always modify file.php to incorporate this function instead of using mod_rewrite.

I guess this is related but a bit OT for this thread, so if there are any questions, we should split off to a new thread.

Jim

jtara




msg:353534
 12:09 am on Apr 4, 2006 (gmt 0)

Imagine there's no files,
It's easy if you try,
No directories below us,
Above us only sky,
Imagine all the URLs,
Living for today...

Imagine there's no directories,
It isn't hard to do,
Nothing to delete or crash for,
And no OS religion too,
Imagine all the URLs,
Living life in peace...

You may say I'm a dreamer,
But I'm not the only one,
I hope someday you'll join us,
And the URLs will be as one.

Imagine no preconceived heirarchy,
I wonder if you can,
No need to grok the relationships,
A brotherhood in URL land,
Imagine all the URLs,
Sharing all across the world...

You may say that I'm a dreamer,
But I'm not the only one,
I hope someday you'll join us,
And the URLs will live as one.

------
With deep apologies to the The Beatles...

Bottom line: Directory and file simply aren't concepts in the world of URLs. A URL is a URL is a URL. It stands alone. (Like the cheese.) It's purpose in life is to locate a resource. (Which resource may be static or dynamically generated.)

Is there a relationship between what's on opposite sides of those slashes? Maybe. Maybe not.

While it's common for webservers to map URLs to directories and files in a heirarchy, this isn't always the case. It's completely a matter of implementation. There are web servers (say, in embedded devices) that serve everything dynamically. Where are the directories and files then? (And of course, this is also true for large portions of many web sites that generate their pages on-the-fly from database, etc.

Here's a good illustration as to why it isn't a good idea to even think of files and directories in the same breath with URLs. You can have a webserver treat a given URL fragment as EITHER a "directory" or a "file".

e.g. www.example.com/Products returns a page, as does www.example.com/Products/. And www.example.com/Products/Widgets returns a different page. This is actually pretty common.

So, it's best just not to think of it in terms of directories and files. It'll mess up your mind.

Kirby




msg:353535
 1:26 am on Apr 4, 2006 (gmt 0)

Currently I have example.com/widgets/wisconsin/madison/.

It is static. This is being redeployed using .NET. Content on the pages (500k or so) will be dynamic. My programmer just told me that becuase of IIS as opposed to Apache, I need an extension.

True or false?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved