Results 1 to 10 of 10
  1. #1
    Join Date
    Aug 2003
    Posts
    2,067

    php file handling

    Hello,
    I am trying to find a fast and effecient method to allow PHP work with large files. A particular project I am working on requires php based security audit (access per authenticated user from a list of users in a database [ie: think vbulletin's download attachment control]) with files up to 3GB in size for each file. Currently, I see several methods available for me, but I do not know which one is fastest / memory efficient.

    For starters, the first method is the real nice and easy readfile() method:
    Code:
    // send http header to user
    ...
    readfile($myfile);
    However, reading on php.net, I noticed that it was mentioned that
    herbert dot fischer at NOSPAM dot gmail dot com
    21-Jul-2005 11:01
    readfile and fpassthru are about 55% slower than doing a loop with "feof/echo fread".
    Which leads me to the good old fopen + fread system where it goes
    Code:
    // send http header to user
    ...
    $fp = fopen($myfile);
    while (!feof($fp)) {
        $buffer = fread($fp, $chunksize);
        print $buffer;
    }
    fclose($fp);
    Another alternative I see is to use symlinks to create quick temp symlink'ed files, and the code would be something like:
    Code:
    symlink($myfile, $somewhereonpub_html);
    header("Location :...");

    Which one of the above is most memory efficient and fastest for execution? If someone with experties in this field would please shed some lights, I'd be greatly appreciated.

    Thanks in advance,
    Andy Huang
    Warning: include('signature') [function.include]: failed to open stream: No such file or directory in eval'd code on line 38
    Warning: include() [function.include]: Failed opening 'signature' for inclusion (include_path='.:/usr/local/php5/lib/php') in eval'd code on line 38

  2. #2
    Join Date
    Jul 2003
    Location
    Kuwait
    Posts
    5,099
    Any of the file system functions that do "complete" reads, such as file() file_get_contents(), etc. are very inefficient and can lead to large amounts of memory consumption (since the file being read is first put in memory). Especially with file() since not only are you reading the entire contents of a file into memory, you are also creating an array out of it. For small files, they are a quick shortcut, but for large files, they can cause a lot of headaches.

    Well symlinking would be the fastest way (no reading and then outputting of the file at all) but you have to make sure that Apache is setup properly to read files, and that that you remove symlinks after the files have been downloaded/viewed.

    If you cannot symlink -- then fopen/fread is the most efficient method. Just ensure that you read in managable chunks.
    In order to understand recursion, one must first understand recursion.
    If you feel like it, you can read my blog
    Signal > Noise

  3. #3
    Join Date
    Mar 2004
    Location
    USA
    Posts
    4,342

    Question

    Quote Originally Posted by fyrestrtr
    Any of the file system functions that do "complete" reads, such as file() file_get_contents(), etc. are very inefficient and can lead to large amounts of memory consumption (since the file being read is first put in memory). Especially with file() since not only are you reading the entire contents of a file into memory, you are also creating an array out of it. For small files, they are a quick shortcut, but for large files, they can cause a lot of headaches.

    Well symlinking would be the fastest way (no reading and then outputting of the file at all) but you have to make sure that Apache is setup properly to read files, and that that you remove symlinks after the files have been downloaded/viewed.

    If you cannot symlink -- then fopen/fread is the most efficient method. Just ensure that you read in managable chunks.
    What would you consider a managable chunk?

    Peace,
    Testing 1.. Testing 1..2.. Testing 1..2..3...

  4. #4
    Join Date
    Nov 2005
    Location
    Buffalo
    Posts
    94
    I had too deal with this when I wrote a script to download monthly logs. Before I used chunks I was running out of RAM.

    PHP Code:
    class download_class{
        var 
    $filename;
        var 
    $path;
        var 
    $fullname;
        var 
    $content_type;
        var 
    $content_length;
        var 
    $content_disposition;
        
        function 
    download_class($path,$file=''){
            
    $this->path = (is_dir($path))? $pathdirname($path);      
            
    $this->filename = ($file)? $file basename($path);
            
    $this->fullname $this->path."/".$this->filename;
        
            
    $this->check_file();
                    
            
    $this->content_type 'application/octet-stream';
            
    $this->content_disposition 'attachment';
            
    $this->content_length filesize($this->fullname);
        
        }
        function 
    pprint(){
            echo 
    "<pre>\n";
            
    print_r($this);
            echo 
    "</pre>\n";
        }
        function 
    check_file(){
            if(!
    is_file($this->fullname)){
                echo 
    "ERROR: File[".$this->fullname."] does not exist!\n";            
                exit;
            } 
        }
        function 
    download_file(){
            
    $this->check_file();
            
    header ("HTTP/1.1 200 OK");
            
    header ("X-Powered-By: PHP/" phpversion());
            
    header ("Expires: Thu, 19 Nov 1981 08:52:00 GMT");
            
    header ("Cache-Control: None");
            
    header ("Pragma: no-cache");
            
    header ("Accept-Ranges: bytes");
            
    header ("Content-Disposition: " $this->content_disposition "; filename=\"" $this->filename "\"");
            
    header ("Content-Type: " $this->content_type);
            
    header ("Content-Length: " $this->content_length);
            
    header ("Proxy-Connection: close");
           
            
    $this->readfile_chunked();        
            return;
        }
        function 
    readfile_chunked () {
            
    $chunksize 1*(1024*1024); // how many bytes per chunk
            
    $buffer '';
            
    $handle fopen($this->fullname'rb');
            if (
    $handle === false) {
                return 
    false;
            }
            while (!
    feof($handle)) {
                
    $buffer fread($handle$chunksize);
                print 
    $buffer;
            }
            return 
    fclose($handle);
        }
        

    Its partially based on "class EasyDownload" by "Olavo Alexandrino" May2004
    Malenski.com - "In Valid Code We Trust"

  5. #5
    Join Date
    Mar 2004
    Location
    USA
    Posts
    4,342
    $chunksize = 1*(1024*1024); // how many bytes per chunk

    so thats around 1 mb per round?

    Peace,
    Testing 1.. Testing 1..2.. Testing 1..2..3...

  6. #6
    Join Date
    Nov 2005
    Location
    Buffalo
    Posts
    94
    Yup - 1mb according to my trustee old google calculator
    Malenski.com - "In Valid Code We Trust"

  7. #7
    Join Date
    Aug 2003
    Posts
    2,067
    With the class above, is there going to be script execution time exceeding allowance problem? (IE: do I need to set the timeout to 0 before calling it?) Additionally, does it affect the download speed in any way? Also, I noticed that you used fopen for it, would there be any file locking problem? Because of the nature of this applicaiton I'm working on, I may need to have multiple people accessing the same file at the same time, would the fopen lock the file to only the first user? And finally, would such process of repeated disk read operation cause very high load average?
    Last edited by NE-Andy; 12-15-2005 at 09:07 PM.
    Warning: include('signature') [function.include]: failed to open stream: No such file or directory in eval'd code on line 38
    Warning: include() [function.include]: Failed opening 'signature' for inclusion (include_path='.:/usr/local/php5/lib/php') in eval'd code on line 38

  8. #8
    Join Date
    Oct 2002
    Location
    Canada
    Posts
    3,100
    I suggest you try keeping php out of that. fyrestrtr mentioned creating temporary shortcuts and letting apache deal with actual download. I like that idea a lot. If you do not want to make and delete shortcuts you could be adding temporary Rewrite rules in htaccess and again let apache deal with actual download.

  9. #9
    Join Date
    Nov 2005
    Location
    Buffalo
    Posts
    94
    Yes I set time limit = 0 before I start a download. You'll get an error the first time you run out of time.

    fopen read only should take care of locking issues.

    The repeated file reads are inivitible. If the file is larger then the memory you have installed. Then trying to store the file in memory would put it in swap space and would still use the disk. If the file is being read without changing at the same time the OS (I believe FreeBSD does) should optimize the file reads by storing it in memory. Kind of like doing an "ls" on a directory, it might take long the first time but if you do it again it will be faster because the OS already has info in memory.

    The symlink solution fyrestrt suggested is interesting, I've never though of it before but the next time I have to do something like this I'll give it a shot.
    Malenski.com - "In Valid Code We Trust"

  10. #10
    Join Date
    Aug 2003
    Posts
    2,067
    ok, thanks guys, I'll stick with my symlink code then.

    Cheers.
    Warning: include('signature') [function.include]: failed to open stream: No such file or directory in eval'd code on line 38
    Warning: include() [function.include]: Failed opening 'signature' for inclusion (include_path='.:/usr/local/php5/lib/php') in eval'd code on line 38

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •