Welcome to WebmasterWorld Guest from

Forum Moderators: bakedjake

Message Too Old, No Replies

Most efficient method for counting files

9:54 am on Jul 4, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 29, 2005
posts: 73
votes: 0

Hey all.

I have a little system that I have set up, that receives incoming requests from my client sites. The results of these requests are then cached to the file system, to reduce the load on the server the next time the same request is performed.

I use the following method to generate the cache filename:
- I generate an MD5 hash of the requested URL.
- I then create a directory in
, where the
values are the positions of those letters.
- I then truncate this hash, and save the file in the before-mentioned directory.

For example, the hash for http://www.google.co.za is 260289fb0e63d27a83fb63a1f5449806. The cache file for this request would be in

I'm needing to keep track of the number of these cache files. I don't care about which directories they're in, I just need to know a total count of files only in /tmp/cache/.

Currently, I'm using
find /tmp/cache -type f | wc -l
to generate this count of files. However, I've noticed that it's taking longer and longer to find these files (and uses more processing power than I'd like).

At the moment, there are about 370,000 files in the directory, and the file system is an EXT4 file system.

Does anyone have a better and more efficient method than this for finding the count of files? Please? :)
1:01 pm on July 11, 2011 (gmt 0)


WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
votes: 84

try this:

ls -R /tmp/cache|egrep -v '^($|\/)'|wc
10:20 am on Oct 7, 2011 (gmt 0)

New User

5+ Year Member

joined:Oct 6, 2011
votes: 0

You still have to traverse all those directories (linear time operation), so it will necessarily take longer and longer.

The most efficient way is still "find", here is how to (vastly) improve the performance of your command:

find /tmp/cache -mindepth 5 -maxdepth 5 -type f | awk 'END { print NR }'

  • you specify the depth to search for. All your files are 4 subdirs in, so you both lower- and upper-limit to 5
  • you count with awk to produce a purely numeric output

Find is usually fast, especially if occurring at close intervals (VFS caches the directory structure), up to 500k files or so. If you exceed that, consider introducing an explicit counting mechanism in your caching, +1 when you add, -1 when you remove.