Results 1 to 4 of 4
  1. #1
    Join Date
    Jan 2004

    How to organize 500.000 files


    I run a fairly popular file hosting service and in the process of moving servers.

    At the moment i have a system that automatic create a new folder each day and store new uploaded files into this folder. The name of the folder match the current date etc.


    I find this system a bit "outdated" and belive there is a smarter way to do this.

    My new fileservers contain ~10TB of available diskspace (each).
    Average filesize is around 10MB.

    Any suggestions? Back in the days I was told not to put all the files into one folder, still the case?

  2. #2
    Join Date
    Nov 2001
    Older file system versions most definitely had significant performance issues even with limited numbers of file entries; I don't know how they all compare these days but in this age of big disc, it seems that some fs's have been improved in this regard. It is orders of magnitude faster to look up an entry in a btree managing 500,000 keys than from a simple list.

    Still, there are other reasons not to put too many files in one directory. For example some userland tools barf when encountering large number of files. You can often run into this sort of issue in a Maildir folder hierarchy when a user has been collecting mail for years.

    Thus even if performance were not an issue I would still spread the files out over directories. My background is high volume imaging and large scale document management - so a million files is not an abstract concept to me either.

    I've used date-based spread when it makes sense from a business perspective, or file spreading based on some other business need or data point, otherwise a random spread.

    I wrote about file spreading here on WHT recently, noting the concept of "volumes". I often like to have some notion of a volume high up in the directory tree so I can move a large number of files easily to another location. Copy the files; one update - one column in one row - to a table makes it complete. Test and then delete the originals.
    “Even those who arrange and design shrubberies are under
    considerable economic stress at this period in history.”

  3. #3
    Join Date
    Mar 2004
    Derby, Kansas
    I guess the question I would ask is: Is there a way to categorize your data that makes more sense?

    While I am not sure exactly how many users you have or what types of files are typically stored, I would assume there would be a more logical way of organizing the files.

    What makes business sense? Can you implement a storage system that makes your support easier or more efficient? What other unique aspects do these groups of files have?
    Tyler Thompson
    Marketing Manager | Shared, Business, Reseller, VPS, and Dedicated Hosting
    WebHostingBuzz Hosting, Marketing, Secuity and Technology Blog

  4. #4
    Join Date
    Jan 2004
    Those files are spread across public uploads (not attached to any accounts), rest is spread across user accounts.

    I dont think there is anyway to categorize the files.
    Guess the best way is to keep using folders then

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts