Move image backup-related FAQ entries to a new page

author: Milkey Mouse <milkeymouse@meme.institute> 2017-11-23 11:36:05 -0800
committer: Milkey Mouse <milkeymouse@meme.institute> 2017-11-23 12:14:35 -0800
commit: c0edc60ca60f9bed248527ccb6fb35757bd7deaf (patch)
tree: f4413e1a23f5afd845e377f744d7ed14356924db
parent: b0d68036c29207c6d0b0990c44acc84c251e5d9e (diff)
3 files changed, 120 insertions, 119 deletions
diff --git a/docs/deployment.rst b/docs/deployment.rst
index 7b1caf925..c7c3a6ee1 100644
--- a/docs/deployment.rst
+++ b/docs/deployment.rst
@@ -12,3 +12,4 @@ This chapter details deployment strategies for the following scenarios.
    deployment/central-backup-server
    deployment/hosting-repositories
    deployment/automated-local
+   deployment/image-backup
diff --git a/docs/deployment/image-backup.rst b/docs/deployment/image-backup.rst
new file mode 100644
index 000000000..fd684c3f2
--- /dev/null
+++ b/docs/deployment/image-backup.rst
@@ -0,0 +1,119 @@
+.. include:: ../global.rst.inc
+.. highlight:: none
+
+Backing up entire disk images
+=============================
+
+Backing up disk images can still be efficient with Borg because its `deduplication`_
+technique makes sure only the modified parts of the file are stored. Borg also has
+optional simple sparse file support for extract.
+
+Decreasing the size of image backups
+------------------------------------
+
+Disk images are as large as the full disk when uncompressed and might not get much
+smaller post-deduplication after heavy use because virtually all file systems don't
+actually delete file data on disk but instead delete the filesystem entries referencing
+the data. Therefore, if a disk nears capacity and files are deleted again, the change
+will barely decrease the space it takes up when compressed and deduplicated. Depending
+on the filesystem, there are several ways to decrease the size of a disk image:
+
+Using ntfsclone (NTFS, i.e. Windows VMs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``ntfsclone`` can only operate on filesystems with the journal cleared (i.e. turned-off
+machines), which somewhat limits its utility in the case of VM snapshots. However, when
+it can be used, its special image format is even more efficient than just zeroing and
+deduplicating. For backup, save the disk header and the contents of each partition::
+
+    HEADER_SIZE=$(sfdisk -lo Start $DISK | grep -A1 -P 'Start$' | tail -n1 | xargs echo)
+    PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
+    dd if=$DISK count=$HEADER_SIZE | borg create repo::hostname-partinfo -
+    echo "$PARTITIONS" | grep NTFS | cut -d' ' -f1 | while read x; do
+        PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
+        ntfsclone -so - $x | borg create repo::hostname-part$PARTNUM -
+    done
+    # to backup non-NTFS partitions as well:
+    echo "$PARTITIONS" | grep -v NTFS | cut -d' ' -f1 | while read x; do
+        PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
+        borg create --read-special repo::hostname-part$PARTNUM $x
+    done
+
+Restoration is a similar process::
+
+    borg extract --stdout repo::hostname-partinfo | dd of=$DISK && partprobe
+    PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
+    borg list --format {archive}{NL} repo | grep 'part[0-9]*$' | while read x; do
+        PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
+        PARTITION=$(echo "$PARTITIONS" | grep -E "$DISKp?$PARTNUM" | head -n1)
+        if echo "$PARTITION" | cut -d' ' -f2- | grep -q NTFS; then
+            borg extract --stdout repo::$x | ntfsclone -rO $(echo "$PARTITION" | cut -d' ' -f1) -
+        else
+            borg extract --stdout repo::$x | dd of=$(echo "$PARTITION" | cut -d' ' -f1)
+        fi
+    done
+
+.. note::
+
+   When backing up a disk image (as opposed to a real block device), mount it as
+   a loopback image to use the above snippets::
+
+       DISK=$(losetup -Pf --show /path/to/disk/image)
+       # do backup as shown above
+       losetup -d $DISK
+
+Using zerofree (ext2, ext3, ext4)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``zerofree`` works similarly to ntfsclone in that it zeros out unused chunks of the FS,
+except it works in place, zeroing the original partition. This makes the backup process
+a bit simpler::
+
+    sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d' | grep Linux | cut -d' ' -f1 | xargs -n1 zerofree
+    borg create --read-special repo::hostname-disk $DISK
+
+Because the partitions were zeroed in place, restoration is only one command::
+
+    borg extract --stdout repo::hostname-disk | dd of=$DISK
+
+.. note:: The "traditional" way to zero out space on a partition, especially one already
+          mounted, is to simply ``dd`` from ``/dev/zero`` to a temporary file and delete
+          it. This is ill-advised for the reasons mentioned in the ``zerofree`` man page:
+
+          - it is slow
+          - it makes the disk image (temporarily) grow to its maximal extent
+          - it (temporarily) uses all free space on the disk, so other concurrent write actions may fail.
+
+Virtual machines
+----------------
+
+If you use non-snapshotting backup tools like Borg to back up virtual machines, then
+the VMs should be turned off for the duration of the backup. Backing up live VMs can
+(and will) result in corrupted or inconsistent backup contents: a VM image is just a
+regular file to Borg with the same issues as regular files when it comes to concurrent
+reading and writing from the same file.
+
+For backing up live VMs use filesystem snapshots on the VM host, which establishes
+crash-consistency for the VM images. This means that with most file systems (that
+are journaling) the FS will always be fine in the backup (but may need a journal
+replay to become accessible).
+
+Usually this does not mean that file *contents* on the VM are consistent, since file
+contents are normally not journaled. Notable exceptions are ext4 in data=journal mode,
+ZFS and btrfs (unless nodatacow is used).
+
+Applications designed with crash-consistency in mind (most relational databases like
+PostgreSQL, SQLite etc. but also for example Borg repositories) should always be able
+to recover to a consistent state from a backup created with crash-consistent snapshots
+(even on ext4 with data=writeback or XFS). Other applications may require a lot of work
+to reach application-consistency; it's a broad and complex issue that cannot be explained
+in entirety here.
+
+Hypervisor snapshots capturing most of the VM's state can also be used for backups and
+can be a better alternative to pure file system based snapshots of the VM's disk, since
+no state is lost. Depending on the application this can be the easiest and most reliable
+way to create application-consistent backups.
+
+Borg doesn't intend to address these issues due to their huge complexity and
+platform/software dependency. Combining Borg with the mechanisms provided by the platform
+(snapshots, hypervisor features) will be the best approach to start tackling them.
+\ No newline at end of file
diff --git a/docs/faq.rst b/docs/faq.rst
index 109d6ce63..ef64c39a5 100644
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -8,125 +8,6 @@ Frequently asked questions
 Usage & Limitations
 ###################
 
-Can I backup VM disk images?
-----------------------------
-
-Yes, the `deduplication`_ technique used by
-Borg makes sure only the modified parts of the file are stored.
-Also, we have optional simple sparse file support for extract.
-
-If you use non-snapshotting backup tools like Borg to back up virtual machines,
-then the VMs should be turned off for the duration of the backup. Backing up live VMs can (and will)
-result in corrupted or inconsistent backup contents: a VM image is just a regular file to
-Borg with the same issues as regular files when it comes to concurrent reading and writing from
-the same file.
-
-For backing up live VMs use file system snapshots on the VM host, which establishes
-crash-consistency for the VM images. This means that with most file systems
-(that are journaling) the FS will always be fine in the backup (but may need a
-journal replay to become accessible).
-
-Usually this does not mean that file *contents* on the VM are consistent, since file
-contents are normally not journaled. Notable exceptions are ext4 in data=journal mode,
-ZFS and btrfs (unless nodatacow is used).
-
-Applications designed with crash-consistency in mind (most relational databases
-like PostgreSQL, SQLite etc. but also for example Borg repositories) should always
-be able to recover to a consistent state from a backup created with
-crash-consistent snapshots (even on ext4 with data=writeback or XFS).
-
-Hypervisor snapshots capturing most of the VM's state can also be used for backups
-and can be a better alternative to pure file system based snapshots of the VM's disk,
-since no state is lost. Depending on the application this can be the easiest and most
-reliable way to create application-consistent backups.
-
-Other applications may require a lot of work to reach application-consistency:
-It's a broad and complex issue that cannot be explained in entirety here.
-
-Borg doesn't intend to address these issues due to their huge complexity
-and platform/software dependency. Combining Borg with the mechanisms provided
-by the platform (snapshots, hypervisor features) will be the best approach
-to start tackling them.
-
-How can I decrease the size of disk image backups?
---------------------------------------------------
-
-Full disk images are as large as the full disk when uncompressed and might not get much
-smaller post-deduplication after heavy use. This is because virtually all file systems
-don't actually delete the data on disk (that is the place of so-called "secure delete")
-but instead delete the filesystem entries referring to the data. This leaves the random
-data on disk until the FS eventually claims it for another file. Therefore, if a hard
-drive nears capacity and files are deleted again, the change will barely decrease the
-space it takes up when compressed and deduplicated. Depending on the filesystem of the
-VM (or physical computer, if for some reason a normal filesystem backup can't be taken),
-there are several ways to decrease the size of a full image:
-
-Using ntfsclone (NTFS, i.e. Windows VMs)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-ntfsclone can only operate on filesystems with the journal cleared (i.e. turned-off
-machines) which somewhat limits its utility in the case of VM snapshots. However,
-when it can be used, its special image format is even more efficient than just zeroing
-and deduplicating. For backup, save the disk header and the contents of each partition::
-
-    HEADER_SIZE=$(sfdisk -lo Start $DISK | grep -A1 -P 'Start$' | tail -n1 | xargs echo)
-    PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
-    dd if=$DISK count=$HEADER_SIZE | borg create repo::hostname-partinfo -
-    echo "$PARTITIONS" | grep NTFS | cut -d' ' -f1 | while read x; do
-        PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
-        ntfsclone -so - $x | borg create repo::hostname-part$PARTNUM -
-    done
-    # to backup non-NTFS partitions as well:
-    echo "$PARTITIONS" | grep -v NTFS | cut -d' ' -f1 | while read x; do
-        PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
-        borg create --read-special repo::hostname-part$PARTNUM $x
-    done
-
-Restoration is similar to the above process, but done in reverse::
-
-    borg extract --stdout repo::hostname-partinfo | dd of=$DISK && partprobe
-    PARTITIONS=$(sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d')
-    borg list --format {archive}{NL} repo | grep 'part[0-9]*$' | while read x; do
-        PARTNUM=$(echo $x | grep -Eo "[0-9]+$")
-        PARTITION=$(echo "$PARTITIONS" | grep -E "$DISKp?$PARTNUM" | head -n1)
-        if echo "$PARTITION" | cut -d' ' -f2- | grep -q NTFS; then
-            borg extract --stdout repo::$x | ntfsclone -rO $(echo "$PARTITION" | cut -d' ' -f1) -
-        else
-            borg extract --stdout repo::$x | dd of=$(echo "$PARTITION" | cut -d' ' -f1)
-        fi
-    done
-
-.. note::
-
-   When backing up a disk image (as opposed to a real block device), mount it as
-   a loopback image to use the above snippets::
-
-       DISK=$(losetup -Pf --show /path/to/disk/image)
-       # do backup as shown above
-       losetup -d $DISK
-
-Using zerofree (ext2, ext3, ext4)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-zerofree works similarly to ntfsclone in that it zeros out unused chunks of the FS, except
-that it works in place, zeroing the original partition. This makes the backup process a bit
-simpler::
-
-    sfdisk -lo Device,Type $DISK | sed -e '1,/Device\s*Type/d' | grep Linux | cut -d' ' -f1 | xargs -n1 zerofree
-    borg create --read-special repo::hostname-disk $DISK
-
-Because the partitions were zeroed in place, restoration is only one command::
-
-    borg extract --stdout repo::hostname-disk | dd of=$DISK
-
-.. note:: The "traditional" way to zero out space on a partition, especially one already
-          mounted, is to simply ``dd`` from ``/dev/zero`` to a temporary file and delete
-          it. This is ill-advised for the reasons mentioned in the ``zerofree`` man page:
-
-          - it is slow
-          - it makes the disk image (temporarily) grow to its maximal extent
-          - it (temporarily) uses all free space on the disk, so other concurrent write actions may fail.
-
 Can I backup from multiple servers into a single repository?
 ------------------------------------------------------------
author	Milkey Mouse <milkeymouse@meme.institute>	2017-11-23 11:36:05 -0800
committer	Milkey Mouse <milkeymouse@meme.institute>	2017-11-23 12:14:35 -0800
commit	c0edc60ca60f9bed248527ccb6fb35757bd7deaf (patch)
tree	f4413e1a23f5afd845e377f744d7ed14356924db
parent	b0d68036c29207c6d0b0990c44acc84c251e5d9e (diff)