Backups
There are many reasons to have good backups: Hard drives fail, system administrators make mistakes, users accidentally delete files, hackers break into systems and cause damage, et cetera. There are hundreds of ways to create backups, and it's often difficult to compare different backup systems without actually trying them out. However, backup systems can generally be summarized according to the following qualitative characteristics:
- Incrementality
- Granuality
- Encryption
- Service
Understanding these characteristics should make it much easier to choose the right backup system.
Contents |
Incrementality
The incrementality of a backup system is probably its most obvious characteristic, since it has a dramatic effect on how long it takes to create backups, how much space it takes to store them, and how easily manageable they are. In terms of incrementality, there are 4.5 types of backups:
- Full backups
- Incremental backups
- Reverse incremental backups
- Snapshotted backups
- Synchronization
Full backups
The oldest approach to incrementality is full backups. This is exactly what it sounds like: Every time you want to create a backup, you make a copy of everything.
Pros:
- Full backups are very simple -- it's very unlikely that anything will go wrong.
- If you don't want a backup any more, you can simply delete it.
Cons:
- Full backups are slow and take up a lot of space (and bandwidth if you're sending them off-site).
Incremental backups
In order to make backups faster, the concept of incremental backups was invented. The idea behind incremental backups is that once you've got one full backup, you don't need to back up data which hasn't changed thereafter. With an incremental backup system, whenever you create a backup you have a choice between making it a full backup or making it incremental with respect to an earlier backup.
Pros:
- Incremental backups are faster and smaller than full backups.
Cons:
- You can't delete a backup if you want to keep newer backups which are based on it.
- Restoring from incremental backups can be much slower than full backups, because you have to first restore a full backup and then restore increments one by one.
Reverse incremental backups
A variant on the idea of incremental backups is reverse incremental backups. Here instead of storing a full backup and the differences between that and later backups, you store a full backup and the differences between that and earlier backups. Whenever you create a new backup, the full backup is updated and a new increment is added.
Pros:
- You can always delete the oldest of a set of backups.
- Restoring the most recent backup is fast.
Cons:
- You can't delete a backup if you want to keep any older backups.
- Reverse incremental backups are often somewhat slower than incremental backups due to the need to update the latest full backup stored.
Snapshotted backups
Snapshotted backups were introduced to combine the flexibility of full backups with the performance of incremental backups. Snapshotted backups treat each archive as being composed of a number of pieces which can be independently stored or removed; backups are created by storing any new pieces, and backups are deleted by removing any pieces which are no longer needed by any archives.
Pros:
- Snapshotted backups are just as fast as incremental backups.
- If you have a set of snapshotted backups, you can delete any backup while keeping all the rest; for example, you could start by creating a backup every day, but then decide later to only keep one backup per week for the past year and one backup per month for the past five years.
Cons:
- Snapshotted backups are more complex to implement.
Synchronization
This is the 0.5 out of the 4.5 types: While synchronization is often used in place of a backup system, it isn't really a backup system. The idea behind synchronization is to keep two or more copies of data, either by immediately synchronizing (e.g., RAID) or by mirroring data using a synchronization protocol.
Pros:
- If the original system fails, you have a ready-to-use copy of your data.
Cons:
- If anything happens to damage or delete your data, the damage or deletion will be copied to your "backup".
Granuality
Granuality can be thought of as a sub-category of incrementality, since it deals with how a backup system handles multiple backups of data which is somewhat but not completely different each time. Granuality applies to incremental, reverse incremental, and snapshotted backups only.
File-level granuality
With file-level granuality, an incremental, reverse incremental, or snapshotted backup system only looks at whether a file as changed when deciding whether to create a copy of it: Even if only a small part of the file has changed, the entire file is duplicated.
Pros:
- File-level granuality is very simple.
Cons:
- File-level granuality can result in lots of wasted storage (and bandwidth for off-site backups) if you frequently have small changes to big files (e.g., data appended to large log files or mail spools).
Block-level granuality
With block-level granuality, an incremental, reverse incremental, or snapshotted backup system looks inside files and only backs up individual blocks which have changed. Block sizes used typically range from under 1 kB to over 1 MB, although the smallest block sizes are normally only used in synchronization protocols, not in true backups.
Pros:
- Block-level granuality is generally more efficient than file-level granuality.
Cons:
- Block-level granuality is more complex to implement.
Encryption
Backups often contain sensitive information, so their security is obviously something which should be considered. The cornerstone of any approach to security is encryption, and there are 4 common ways that encryption is used with backups:
- No encryption
- Encrypted transport
- Symmetric encryption
- Public-key encryption
No encryption
Many backup systems simply don't handle encryption at all.
Pros:
- You never need to worry about losing your decryption keys.
Cons:
- It's up to you to figure out how to make your backups secure.
Encrypted transport
The most dangerous place on the Internet is on the wires -- that is, when data is being transmitted between systems. With encrypted transport (e.g., over SSH or SSL) data is encrypted "on the wire", but not when the data is stored.
Pros:
- People won't be able to intercept your data in transit.
- Encrypted transport only uses emphemeral (short-term) keys, so as long as you don't lose access to the system storing your backups, you don't need to worry about losing any decryption keys.
Cons:
- If someone breaks into the system where your backups are stored, they will be able to read your backups (bad) and modify them (possibly even worse, since you might restore a backup which they have inserted a rootkit into).
Symmetric encryption
With symmetric encryption, backups are encrypted on the system which generates them, and the system which stores them has no way to read the data.
Pros:
- People won't be able to intercept your data in transit or by breaking into the system which stores them.
Cons:
- If you lose your decryption key, you can't get your backups back.
- If you have a system set up to create backups automatically (i.e. it has its key stored somewhere) then anyone who breaks into the system can read all of your old backups.
- Backup systems using symmetric encryption usually don't have any protection against backups being maliciously modified.
Public-key encryption
With public-key encryption, there are separate keys used to encrypt and decrypt data, and (usually) additional keys used to sign and verify data.
Pros:
- People won't be able to intercept your data in transit or by breaking into the system which stores them.
- If someone tampers with your backups, they'll be recognized as corrupt.
- You can set up a system to create backups automatically while not being able to read old backups.
Cons:
- If you lose your decryption key, you can't get your backups back.
Service
The final characteristic to consider is whether you want to use a backup service or to store your backups yourself.
Backup service
With a backup service, you pay someone else to store your backups. This may involve fixed backup plans (e.g., backup up to 10 GB for $5/month) or it may involve metered charges (e.g., backup for $0.50/GB/month) where your actual usage is measured and charges are computed from that.
Pros:
- Someone else is responsible for making sure that the disks holding your backups don't fail (or that they have extra copies of your backups in case disks do fail).
- If you only have a small amount of data, a backup service is often cheaper, since the cheapest disk space is the disk space you don't have to pay for.
Cons:
- Companies which run backup services are (presumably) trying to make a profit, and any profit they make is money their customers don't have.
Self-storage
Rather than using a backup service, many people store their backups themselves, either using space disk space, by renting "web hosting" space and using it for storage, or by renting a dedicated server.
Pros:
- If you have disk space, you might be able to store your backups for free.
- You're not paying for someone else's profits.
Cons:
- If the disk or server where you're storing your backups fails, you won't have your backups any more.
- Shared web hosting companies often have Terms of Service which prohibit the use of space for purposes other than web hosting.
- Keeping track of lots of backups can be time consuming.
See also
Web Hosting Wiki article text shared under a Creative Commons License.


