Forum Moderators: phranque

Message Too Old, No Replies

"Correct file names"

client is upset that directories are visible

         

D_Blackwell

6:15 pm on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm managing a site that has about 20 sub-directories, each of which contains a particular product line or subject. (5-20 files)

The home page, in the main directory, is named index.html I haven't done this with any of the sub-directories because even if someone wanted to just go see the directory contents, I wouldn't care.

I've assured the owner that I will be happy to name an index.html for each directory - but he seems intent on being unhappy. (but hasn't given me any specific reason for it! I don't even know how he came to find out.)

So, for my own benefit, is there a standard I should be following? Are there pros and cons to this? If he's being unreasonable in his demand, he'll have to pay (-me ducking-), but if I'm allowing some sort of breach, which is the implication, then I'll need do something for him beyond just setting it right.

Marcia

6:27 pm on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm no expert in servers, but I understand that it can be somewhat of a security breach to have the file indexes showing in sudirectory roots.

I have facility to do it automatically with hosting but in addition to adding a blank index page if necessary, there's a way to configure so that the subdirectory roots, if empty, return a code of "forbidden" when someone backs up to view what's in them.

It isn't a good idea to leave them exposed and visible.

D_Blackwell

7:00 pm on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks! I hadn't thought of just inserting a blank index.html That will be a lot easier for the fixes, and going forward I can specify the primary file as index.html

I don't understand the breach though. How can someone actually get in and do damage to my files?

---there's a way to configure so that the subdirectory roots, if empty, return a code of "forbidden"---
I'm not familiar with this. Is there a 'standard' method? If it's too complex, I probably won't go with it as a first choice, but am interested in how that would work.

incywincy

7:18 pm on Nov 8, 2003 (gmt 0)

10+ Year Member



if you are running on apache you can disable directory browsing by setting the directive

<Directory /www/html>
Options -Indexes
</Directory>

in your httpd.conf file (or .htaccess if overrides are set to all)

where /www/html is your root directory

if you wanted a nicely formatted page you could set up a custom error page for http 403 error codes that would be generated if directory browsing was attempted

Marcia

7:34 pm on Nov 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are academic papers out there that indicate a certain relevance to root level pages, even in subdirectories. It's also been mentioned at times that some search engines might give a slight boost to index.htm files - even in subdirectories.

There's nothing concrete, and a lot of people do /widgets-directory/widgets.html for the page linked to as the main one for the section. But it's just as easy to make the main page of the section that's linked to /widgets/ and it doesn't leave gaps. The only problem with that is in editing pages locally, making sure the right index.htm is uploaded to the right directory online because of several having the same filename - index.htm.

Shannon Moore

9:54 pm on Nov 8, 2003 (gmt 0)

10+ Year Member



Your client probably found it the way other users would find it -- after a couple visits, folks familiar with a site's structure may head right to where they want to go by typing in the URL, and when typing, omitting the index page filename saves some keystrokes. For example:

htt*://www.widgets.com/bigwidgets/
instead of
htt*://www.widgets.com/bigwidgets/index.html

It also makes for fewer characters if someone wants to externally link to that particular page.

For the same reason, I omit "www" when possible if linking to other sites, though may servers aren't set up properly to serve pages when the "www." is omitted.

DaveAtIFG

1:18 am on Nov 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't understand the breach though. How can someone actually get in and do damage to my files?

For this question we need to define breach. Presently, anyone can see exactly what files are in each subdirectory and type in the URL to view the files contents. IF any of those files contain sensitive info (passwords, credit card numbers, personal names and contact info, etc), I'd consider that a major security breach.

If you are cloaking for example, and one or more subdirectories contain files intended for SE spiders only, your competitors can easily see them, a less serious breach.

Bottom line, there is a breach if someone can see information they are not intended to see, regardless of whether they can modify or delete the info.

D_Blackwell

1:49 am on Nov 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DaveAtIFG, That sounds like a succinct explanation. Thanks. No, absolutely no sensitive information is exposed, only pages that are intended for public consumption. Also, no cloaking. I don't believe that a breach exists using this definition.

I'll still have to handle it, because the owner isn't one who discusses a decision that's been made. It looks like my conscience can be clear, but I will probably head off the whole thing going forward, and specify an index.html for new directories.

Rincewind

10:46 pm on Nov 9, 2003 (gmt 0)

10+ Year Member



As default, I used to name the first page in a sub dir as index.*. However I changed about 18 months ago to use more keyword orientated file names, where appropriate, on new sites. Some of my sites also use the blank index.html to prevent snooping, some use .htaccess to prevent it, yet others don't use anything, and one site actually has no index file and I use the automatic index listing to display all the pages (just to save me updating things all the time).

However I have found one disadvantage to naming lots of files index.*. I have several thousands of index.html files on my hard drive (ex temp internet files) which make for a very large portion of my web pages. Obviously this is a potential nightmare should I have to sort through them. Naming pages after what they are about does make file management easier. Thousands of pages called index.html makes it quite hard.

pageoneresults

11:48 pm on Nov 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The only problem with that is in editing pages locally, making sure the right index.htm is uploaded to the right directory online because of several having the same filename - index.htm.

This is where discipline in organization plays a key role in maintaining a site with many index pages. It's also a tie in to the many recent and past topics on using a WYSIWYG tool like FP or DW. With all the features in these tools, one of them is the ability to deal with hundreds and thousands of sub-directories all with their own index.htm or whatever the extension may be.

I've read those academic papers that Marcia refers to and am a strong advocate of using a sub-directory structure when building web sites. For example...

10 Products
10 Sub-Directories
10 Index Pages

This allows me to provide focus within the site structure. Within each sub-directory are all supporting pages, navigation, css, javascript, etc. In FP (FrontPage), I can create what are called subwebs and edit each one as an individual site, tis a very powerful program.

For the same reason, I omit "www" when possible if linking to other sites.

There are many of us here who would suggest that you not omit the www. from your external links. This is due to the fact that www. and non www. are different entities.

Shannon Moore

12:47 am on Nov 11, 2003 (gmt 0)

10+ Year Member



There are many of us here who would suggest that you not omit the www. from your external links. This is due to the fact that www. and non www. are different entities.

Tried to do a quick search on WW for previous discussions on the topic and turned up little. Many (most) sites resolve to the same website whether the "www." is present or not.

Could you explain how this is bad form/improper, or reference a thread I could peruse to educate myself? Thanks!

pageoneresults

1:16 am on Nov 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My understanding is that you can have different content at example.com and www.example.com. Months ago, Google was assigning different Page Rank to each version. During those discussions references to the specification were quoted. It was stated that they are different entities.

Therefore, if you wish to provide a correct link you should encourage your partners to link to the full URI of your resource. The same would apply to your own outbound linking practices.

When linking to root level pages, use the shortest URI...

http //www.example.com/

When linking to index pages in a sub-directory, use the shortest URI...

http //www.example.com/sub/

Why not link to the full URI including the index.htm page? Because that index.htm page could change in the future, the underlying technology that is.

Here's one reference where jdMorgan states (message #6) that www.example.com is a subdomain of example.com...

www. vs. non-www. [webmasterworld.com]

Note: The non www. version should be set up via a 301 permanent redirect to the www. version so that the correct URI is resolved to.

Shannon Moore

2:52 am on Nov 11, 2003 (gmt 0)

10+ Year Member



Well, that was educational reading, although I almost well and truly mucked up my .htaccess.

This short thread [webmasterworld.com], after much reading of the thread(s) you pointed me to, has set me on the path to proper URI usage.

I've even used subdomains before, so I don't know why it didn't occur to me that "www." was in fact a subdomain and could be construed as different from the 'pure' (eg. example.com) URL.

Thanks for your guidance, and sorry for hijacking the tread into .htaccess discussion.