Results 1 to 14 of 14
  1. #1
    Join Date
    Dec 2005
    Posts
    395

    DB recommended for blobs

    Hey Guys,

    I am about to undertake a project which involves a large DB which will have the primary task of serving up images. I have been told by many people not to use MySQL for this because it is notoriously slow when it comes to BLOBs. What are some DB solutions which handle blobs well?

  2. #2
    Join Date
    Aug 2004
    Location
    Dallas, TX
    Posts
    3,507
    Have you considered storing the images on the filesystem and just referencing the file location and md5 within the database?

    That certainly seems to be the most scalable way to store masses of images.
    Dallas Colocation by Incero, 8 years and counting!
    e: sales(at)incero(dot)com 855.217.COLO (2656)
    Colocation & Enterprise Servers, SATA/SAS/SSD, secure IPMI/KVM remote control, 100% U.S.A. Based Staff
    SSAE 16, SAS70, Redundant Power & Network, Fully Diverse Fiber

  3. #3
    Join Date
    Nov 2001
    Location
    Vancouver
    Posts
    2,416
    My area of speciality used to be high volume document imaging and I can tell you that storing images, or pretty much any binary data of size, as a blob is a mistake waiting to be recognized. The vendors that did this sort of tom foolery also sold very expensive software to manage the storage headache such a strategy brings along with it as unwelcome baggage.

    Sure, at first storying binary file data as a blob seems simple but wait until you have gb's of data as blobs and your DB blows up.
    “Even those who arrange and design shrubberies are under
    considerable economic stress at this period in history.”

  4. #4
    I'll re-iterate what the other posts have already said: DON'T DO IT!

    The filesystem will do a much better job of storing and caching binary data than *any* database system will do. It's also often much less complicated to just deal with the filesystem than with a ton of BLOBs in the database.

    Store files on the filesystem, and store the filepath to them in the database (or, better yet, rename them according to their primary key, i.e. 1234.JPG where 1234 is the primary key of the record the JPEG belongs to).

  5. #5
    Join Date
    Aug 2001
    Location
    Orange County, CA
    Posts
    532
    Check out MogileFS (from the memcached people):
    http://mogilefs.com
    Jeff Standen, Chief of R&D, WebGroup Media LLC. - LinkedIn
    Cerb is a fast and flexible web-based platform for business collaboration and automation. http://www.cerbweb.com/

  6. #6
    Join Date
    Mar 2007
    Location
    Phoenix, AZ, United State
    Posts
    1,492
    Quote Originally Posted by consolibyte View Post
    The filesystem will do a much better job of storing and caching binary data than *any* database system will do. It's also often much less complicated to just deal with the filesystem than with a ton of BLOBs in the database.

    Store files on the filesystem, and store the filepath to them in the database (or, better yet, rename them according to their primary key, i.e. 1234.JPG where 1234 is the primary key of the record the JPEG belongs to).
    Care to elaborate on this? Especially the part about the file system caching binary data.

    Is seeking to a point in a database file really that bad?

  7. #7
    Join Date
    Aug 2001
    Location
    Orange County, CA
    Posts
    532
    Quote Originally Posted by HostSentry View Post
    Is seeking to a point in a database file really that bad?
    It's not just about seeking to a record in a huge table according to an index. Files in the database complicate distribution, replication, incremental backups; and they tie up database server resources for no good reason on thousands or millions of little bits of content that will never change.

    For a long time we kept the attachments in the DB for our apps, and one of the better refactors we did was moving them to the filesystem in a series of distributed buckets (/storage/[001-100]) that can be easily synchronized and incrementally backed up (only new files). As mentioned above, the files are named by their primary key; however, we ignore extensions (/storage/050/12345 not 12345.jpg).

    By decoupling the database it also leaves you the option of having abstracted storage back-ends as you grow, like: disk, MogileFS, FTP, Amazon S3, etc.
    Jeff Standen, Chief of R&D, WebGroup Media LLC. - LinkedIn
    Cerb is a fast and flexible web-based platform for business collaboration and automation. http://www.cerbweb.com/

  8. #8
    Join Date
    Mar 2007
    Location
    Phoenix, AZ, United State
    Posts
    1,492
    Quote Originally Posted by jstanden View Post
    It's not just about seeking to a record in a huge table according to an index. Files in the database complicate distribution, replication, incremental backups;
    Isn't it easier to move a database than thousands of files in multiple directories? Isn't it quicker to backup one database file as well?

  9. #9
    Join Date
    Nov 2005
    Posts
    352
    Quote Originally Posted by HostSentry View Post
    Isn't it easier to move a database than thousands of files in multiple directories? Isn't it quicker to backup one database file as well?
    No. Especially when the database contains 10GB of data. (Good luck restoring a 10GB mysql dump file...)

    A properly configured backup system only backs up the changes (for example, maybe 20-200 new files a day, depending on popularity). For a database, the entire database has to be copied over, easily resulting in as much as 10-20GB of transfer (again, depending on popularity). Backing up or moving an entire directory structure can be done in a single command, usually an "rsync", "tar", or "scp" command. This way you move a small, easy to manage database and a either a single directory tree or a single, common, easy to use tarball.

    Storing images in a database is moderately feasible (but still not recommended) when there are only a handful of images (less than a few dozen), but it does not scale at all (especially to the size needed to serve large numbers of images). It might actually be slower as well, since the data has to be retrieved from MySQL and parsed by a script before being sent to the browser, as opposed to being served directly from of the filesystem by the web server.

  10. #10
    Join Date
    Mar 2007
    Location
    Phoenix, AZ, United State
    Posts
    1,492
    Quote Originally Posted by dexxtreme View Post
    No. Especially when the database contains 10GB of data. (Good luck restoring a 10GB mysql dump file...)

    A properly configured backup system only backs up the changes (for example, maybe 20-200 new files a day, depending on popularity). For a database, the entire database has to be copied over, easily resulting in as much as 10-20GB of transfer (again, depending on popularity). Backing up or moving an entire directory structure can be done in a single command, usually an "rsync", "tar", or "scp" command. This way you move a small, easy to manage database and a either a single directory tree or a single, common, easy to use tarball.

    Storing images in a database is moderately feasible (but still not recommended) when there are only a handful of images (less than a few dozen), but it does not scale at all (especially to the size needed to serve large numbers of images). It might actually be slower as well, since the data has to be retrieved from MySQL and parsed by a script before being sent to the browser, as opposed to being served directly from of the filesystem by the web server.
    Generally speaking from an administrative standpoint I suppose that makes sense. But what prevents a programmer from incrementally backing up blob data?

    As far as restoring in full 10GB of one file, compared to 10GB across thousands of files... I don't see how the latter could be quicker.

  11. #11
    Join Date
    Nov 2005
    Posts
    352
    Quote Originally Posted by HostSentry View Post
    Generally speaking from an administrative standpoint I suppose that makes sense. But what prevents a programmer from incrementally backing up blob data?

    As far as restoring in full 10GB of one file, compared to 10GB across thousands of files... I don't see how the latter could be quicker.
    How do you incrementally back up blob data without taking the entire MySQL table file? Many implementations of incremental backups will transfer the entire file if there are any changes at all. Other implementations of it will only transfer the parts of the file that changed, but since everything is in a single database file, it is theoretically possible for so much of the file to have changed that you are effectively going to have to transfer as much as a third the table on every backup.

    Also, dealing with a single file that is 10GB in size means that if there is any corruption whatsoever (MySQL crash, memory error, filesystem error, whatever) the entire file (and every single image) could be wiped out all at once. Sometimes MySQL gets really mad when there is corruption in a table file, even to the point where the only fix is to truncate the table. And in a few instances, a MySQL dump file can require manual fixing before it can be re-imported into a database. (Good luck loading that into "vi".) When you have hundreds or thousands of individual image files, you can lose any number of them and not lose the rest of the data, and the smaller database is more likely to be recoverable in the event of a crash.

    The primary issue really isn't about speed (although that can be an issue as well), it is an issue of how difficult the recovery will be when there is a problem.

  12. #12
    Join Date
    Mar 2007
    Location
    Phoenix, AZ, United State
    Posts
    1,492
    Quote Originally Posted by dexxtreme View Post
    How do you incrementally back up blob data without taking the entire MySQL table file?
    With programming, its not that hard to incrementally back up data in a database, regardless of column type.

    Anyway, I think my point is that fundamentally all we are discussing is two different types of storing data (take our the HTTP server, CGI, etc). Either one file, or across many files. In a custom setup, storing data in one file isn't nearly as atrocious as a lot of people make it out to be.

    Generally speaking, I think referencing images on the hard disk are good for the average user; but that doesn't warrant that animosity toward the blob type.

    Oh, and in regards to data corruption, that's why you do backups.

  13. #13
    Join Date
    Aug 2001
    Location
    Orange County, CA
    Posts
    532
    Quote Originally Posted by HostSentry
    Isn't it easier to move a database than thousands of files in multiple directories? Isn't it quicker to backup one database file as well?
    dexxtreme had a great reply to that. It's the same thing I would have said.

    Quote Originally Posted by HostSentry
    But what prevents a programmer from incrementally backing up blob data?
    Technically, nothing, so long as the database has a way to record changes since the last backup (such as a timestamp) and the backup scripts remember where they left off.

    Quote Originally Posted by HostSentry
    As far as restoring in full 10GB of one file, compared to 10GB across thousands of files... I don't see how the latter could be quicker.
    Parallelism.

    Quote Originally Posted by dexxtreme
    How do you incrementally back up blob data without taking the entire MySQL table file?
    You could dump with a 'WHERE' clause, but the app would have to be designed to support it; and it'd always be kludge.

    Quote Originally Posted by dexxtreme
    Also, dealing with a single file that is 10GB in size means that if there is any corruption whatsoever (MySQL crash, memory error, filesystem error, whatever) the entire file (and every single image) could be wiped out all at once.
    Exactly! And repairing that table to try and recover part of it could possibly take hours.

    The idea with replication on disk-based file stores is that you shouldn't have to 'recover' because each file would be distributed in at least 2 places. Something like MogileFS automates that. It could be overkill for a smaller site/app, but the same principle applies (e.g. toss a second drive in the machine and 'rsync' hourly).

    Quote Originally Posted by HostSentry
    Generally speaking, I think referencing images on the hard disk are good for the average user; but that doesn't warrant that animosity toward the blob type.
    Hey, you're welcome to ignore the warnings of people who have been in the situation of having a ton of BLOBs in the database and bear the scars for it. It's your life.

    If you're only doing it for a few files it's not going to be a big deal. But serving files isn't what databases were made for. There's not a lot of sense in putting lots of data in a database that you will never change, sort on, or join. It's merely a file store, and there are better solutions that take into account the issues you'd face when scaling/replicating that store past a single disk.

    Quote Originally Posted by HostSentry
    Oh, and in regards to data corruption, that's why you do backups.
    In our case, we rsync attachments from disk for thousands of people's email to an off-RAID local disk. Twice a week we rsync that secondary disk to an Amazon EC2 instance, which has an EBS (Elastic Block Store) volume attached. Once a week we snapshot that volume which saves a highly durable copy in Amazon S3 to multiple locations.

    It only takes a few minutes a night and we're talking about tens of millions of attachments. It would be obnoxious to do that in SQL, especially since we'd have to do the same thing to track differences between the local and remote boxes too (not just the database server and local).
    Jeff Standen, Chief of R&D, WebGroup Media LLC. - LinkedIn
    Cerb is a fast and flexible web-based platform for business collaboration and automation. http://www.cerbweb.com/

  14. #14
    Join Date
    Dec 2005
    Posts
    395
    Some great responses here and a lot of good information. One of the reasons I would like to do DB based system is because I have been told by many that doing so allows you to utilize the DB cluster solutions. By that I mean that most DB systems have great features for distributing DBs over many machines, essentially doing the work for you and making scalability easier.

    With regards to the "animosity" towards BLOBs, seems to be because of MySQL. I have been told if you use a more robust DB like oracle or DB2, they handle blobs much better, and don't have the issues people seem to be describing, which is why i was looking for a good DB solution.

Similar Threads

  1. Recommended CP
    By Omni in forum Hosting Software and Control Panels
    Replies: 5
    Last Post: 05-27-2007, 11:14 AM
  2. Recommended My.CNF
    By boonchuan in forum Hosting Security and Technology
    Replies: 3
    Last Post: 01-19-2005, 01:02 PM
  3. Recommended UPS?
    By qps in forum Colocation and Data Centers
    Replies: 6
    Last Post: 11-04-2004, 03:44 PM
  4. MYSQL: blobs
    By ilyash in forum Programming Discussion
    Replies: 1
    Last Post: 06-02-2003, 12:55 AM
  5. Any info about BLOBs?
    By jackdavis in forum Web Hosting
    Replies: 1
    Last Post: 04-02-2003, 10:00 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •