Forum Moderators: phranque

Message Too Old, No Replies

How to Speed Up Directory Reading?

         

HoboTraveler

5:45 pm on Mar 5, 2008 (gmt 0)

10+ Year Member



Hi All,

We have one million files in a directory.

The url domain.com/ is slower to resolve (1 minute) when compared to domain.com/index.html (20 seconds).

I guess this is because Apache has to parse the entire contents of the directory.

Could someone please suggest ways on howto improve performance so that domain.com/ is read as fast as domain.com/index.html?

We're running Apache v2 on a linux VPS/VDS.

TIA

gergoe

7:02 pm on Mar 5, 2008 (gmt 0)

10+ Year Member



Create (multiple levels of) subdirectories, each directory would consist of a single letter, and you will place the files into these directories based on the filename (For example test.html would be relocated to /t/e/test.html, example.html into /e/x/example.html). After that you will create a set of mod_rewrite directives, which internally rewrites the request for the proper file in the proper directory, something like this in a .htaccess file:

Options +FollowSymLinks 
RewriteEngine On
RewriteRule ^([^/.])([^/.])([^/]*)$ /$1/$2/$1$2$3 [L]

This will confuse the default apache error handling a bit (you could add an extra RewriteCond on top of the RewriteRule to circumvent that, but I'd only suggest that if the website does not have many hits), but it will still manage. If you want to keep some files out of this directory nesting, then you will also need to use a (set of) RewriteCond(s) on top of the RewriteRule. See the mod_rewrite documentation [httpd.apache.org] for more information.

By the way if you are running this on a dedicated server, the consider reviewing your Apache configuration, trim down DirectoryIndex, disable .htaccess, and you can do even some more fine tuning, which makes your website run faster in general.

HoboTraveler

4:38 am on Mar 6, 2008 (gmt 0)

10+ Year Member



That this mean that domain.com/test.html would resolve to /t/e/test.html?

I really do not want to change the architecture here.

TIA

gergoe

11:01 am on Mar 6, 2008 (gmt 0)

10+ Year Member



If you do not want to do that, then you can still try fine-tuning your Apache, but the result will not be as satisfying as by redesigning it, because you are hitting some operating system/file system limitations as well. When a train is full, they do not put people on the roof (except in some exotic countries), instead they start an another train.

jdMorgan

3:37 pm on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's clarify one point: The solution above changes the file-storage architecture, but the URLs remain unchanged -- The code implements an internal rewrite only.

Also, the given example shows 'splitting' the URLs into two levels of file directories. With a million files, you might want to consider splitting them further -- into three or four levels. At three levels, assuming an even distribution of filenames among the letters of the alphabet, you'd be down to under 400 files per directory, which Apache can handle with aplomb. However, if the number of files is expected to grow, then four or even five levels would be recommended.

In each case, it will also be necessary to take the length of the URL-path into consideration; That is, if a one-letter URL were requested with the above code in place, then it would not be rewritten at all, since the rule pattern (which requires a minimum of two letters) would not match. So, you may also want additional rules to support shorter URLs.

Jim

gergoe

8:20 pm on Mar 6, 2008 (gmt 0)

10+ Year Member



That is, if a one-letter URL were requested with the above code in place, then it would not be rewritten at all, since the rule pattern (which requires a minimum of two letters) would not match.

Like http://www.example.com/a? Don't tell me you ever seen such an exotic address :-)

jdMorgan

10:40 pm on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe in robust coding practices, fault-tolerance, and where possible, redundancy to prevent failure. Also, in a public forum such as this, there are Webmasters with all kinds of URLs, from the longest to the shortest.

And besides, what needs to be avoided is someone looking at this thread, and saying, "Great, I'll split my files into five directory levels using this (one) rule... Hey, why don't these shorter URLs resolve to the proper directory level?"

Jim

[edited by: jdMorgan at 10:42 pm (utc) on Mar. 6, 2008]

gergoe

10:52 pm on Mar 6, 2008 (gmt 0)

10+ Year Member



You have a point (again). Next time I don't take it away (because it was there).

HoboTraveler

4:56 am on Mar 8, 2008 (gmt 0)

10+ Year Member



Hi Gergoe,

Since the URLs remain the same, I am curious about your method.

I guess it would be possible to store example.html in the directory domain.com/e/example.html correct?

Btw, does this mean that example.html can be accessed in two ways?

domain.com/example.html and domain.com/e/example.html?

TIA

gergoe

12:01 pm on Mar 8, 2008 (gmt 0)

10+ Year Member



Yes, that's correct, but if you indeed have millions of files, then take Jim's advice and consider making more levels of directories. So the file in your example would be stored in domain.com/e/x/a/example.html - when using three levels of directories.

Yes, you could indeed access the files in the 'structured' way, but as long as you do not exploit this to the world (ie. you did not start to use it yourself in your links), then it will remain behind the scenes. Besides, you can easily stop anyone doing that (either by denying such a request, either by redirecting to the proper address) with mod_rewrite.

The example I sent earlier is for two levels directories, if you go for more levels, the rules needs to be slightly adjusted.