KVM live & incremental VM backup with BORG

WORKING DRAFT DOCUMENT - INCOMPLETE

0. Preface

Live incremental VM backups on a stand-alone KVM hypervisor, is something I wanted to get to work for quite some time, but never found the time to. Until now. Turns out it isn’t all that difficult. With a combination of libvirt snapshots, borg, rsnapshot and systemd – and optionally a cloud storage provider – one can create a very nice and robust backup workflow. With these few applications you’ve got an extremely powerful toolkit at your disposal. Since I couldn’t find much information regarding the practical implementation of such a setup, I decided to write this one down. Lets do some high recoverability instead of the usual high availability!

I first looked into this script by michaltrzcinski. Which is a python script that produces a chain of snapshots, each time the script is run a new snapshot is created and the previous one gets backed up. Each snapshot building incrementally on the previous, only copying changed blocks. Creating a chain of backing images. This is a very nice backup solution for virtual machines, providing impressive RPO and RTO values (depending on your intervals, and restore procedure) but the fact that the whole chain is dependent on each previous snapshot does not sit right with me. The whole snapshot chain also complicates things on the libvirt side. I want my backups to be more robust and preferably simpler.

1. Introduction

Enter borg. Thanks to borgs robust deduplication and compression, I can get the same storage efficiency – or even better – as with the aforementioned incremental snapshots. And each backup will be a fully restorable image, not a snapshot dependent on all previous snapshots. Using deduplication instead of snapshot chains to provide storage efficiency. That sounds good to me. With an added bonus of encrypted repositories, functionality built-in to borg.

As a result of doing full image backups and using borg, backups will be more read io and cpu intensive. This is due to the fact that the whole vm disk image is being read (then deduped, compressed and stored). So it is advised to run your backups in larger intervals; run daily backups instead of hourly. In case you want quickly restorable snapshots, just use the internal snapshot feature of virt-manager whenever you think your actions may fuck things up, or before an upgrade, etc. Then delete the snapshot when finished. Or run the backup job manually before maintenance.

In this setup I will also be using rsnapshot to pull database dumps from certain vms. Since our block storage backup can still be somewhat inconsistent when it comes to databases. But we will get into more detail on that later.

2. Libvirt

Let’s do the fun part first!

2.1 Snapshots

We’ll be using virsh snapshot-create-as to command libvirt to take (external) snapshots. That snapshot will offload all write io from our base image, in order to take a consistent backup of the vm disk image. After the backup has completed, we will merge all changes written to the snapshot back to the base image with virsh blockcommit, so that we end up in the same state as we started.

(disk) <==RW [live VM]

(disk) <--R (snapshot) <==RW [live VM]
  |     
  +--COPY-> [backup disk image]

MERGE [(disk), (snapshot)] => (disk)

(disk) <==RW [live VM]

Options for: virsh snapshot-create-as

--name The name of the snapshot, will be appended to image base name. --name backup.qcow2: image.qcow2 -> image.backup.qcow2.
--no-metadata No need for snapshot metadata with full image backup. We wont revert to this image, only restore it.
--atomic Libvirt will guarantee that the snapshot either succeeds, or fails with no changes.
--quiesce Libvirt will try to use guest agent to freeze and unfreeze domain’s mounted file systems. However, if domain has no guest agent, snapshot creation will fail.
--disk-only Do not include VM state.
--live Libvirt takes the snapshot while the guest is running. This increases the size of the memory image of the external checkpoint. This is currently supported only for external checkpoints.
--diskspec Disk targets to snapshot, and snapshot attributes.

Snapshot command:

virsh snapshot-create-as --domain <vm domain>  \
                         --name backup.qcow2   \
                         --no-metadata         \
                         --atomic              \
                         --quiesce             \
                         --disk-only           \
                         --diskspec vda,snapshot=external

This will generate a snapshot in the same directory as the disk image, named <domain>.backup.qcow2.

In order to merge the snapshot back to base image, we need to run virsh blockcommit on the target disk:

virsh blockcommit <vm domain> vda --active --pivot

2.2 Test snapshot & restore

Set up a test vm, to stage the snapshot workflow. Run the snapshot command, copy the disk image and do a blockcommit to get back to the original disk state. Then try a restore from the backed up image.

I deployed a vm named snapshot for these tests. That makes the domain of this example snapshot.

~ $ virsh list
Id    Name                           State
----------------------------------------------------
30    snapshot                       running

~ $ 

Get a list of all disk targets and their source images for this domain:

~ $ virsh domblklist snapshot
Target     Source
------------------------------------------------
vda        /home/kvm/templates/snapshot-root.qcow2

~ $ 

First dump the domain config XML:

virsh dumpxml --migratable snapshot > /backup/snapshot.xml

Create the snapshot:

~ $ virsh snapshot-create-as --domain snapshot   \
                             --name backup.qcow2 \
                             --no-metadata       \
                             --atomic            \
                             --quiesce           \
                             --disk-only         \
                             --diskspec vda,snapshot=external
Domain snapshot backup.qcow2 created
~ $

Verify. Note vda source image, the vm is now writing to the snapshot:

~ $ virsh domblklist snapshot
Target     Source
------------------------------------------------
vda        /home/kvm/templates/snapshot-root.backup.qcow2

~ $

~ $ ls -lah /home/kvm/templates/
-rw------- 1 nobody kvm  704K Aug 24 12:44 snapshot-root.backup.qcow2
-rw-r--r-- 1 nobody kvm  2.7G Aug 24 12:43 snapshot-root.qcow2
~ $

Copy the base image:

cp /home/kvm/templates/snapshot-root.qcow2 /backup/

When the copy is finished we can merge the changes in the snapshot back to the base image:

~ $ virsh blockcommit snapshot vda --active --pivot

Successfully pivoted
~ $

Verify:

~ $ virsh domblklist snapshot
Target     Source
------------------------------------------------
vda        /home/kvm/templates/snapshot-root.qcow2

~ $

Remove the (now unused) snapshot:

rm -f /home/kvm/templates/snapshot-root.backup.qcow2

/backup now contains:

  • snapshot-root.qcow2
  • snapshot.xml

Test your backup by attemting a restore. You can simply remove the original disk from snapshot vm and add the backed up image as disk to the vm. Then try to boot.

For a more thorough restore, remove the snapshot domain from your kvm hypervisor. Then restore from the backed up snapshot.xml xml dump and the backed up disk image.

You will need to perform restore tests somewhat regularly anyway, so you better get used to it. Or automate and report it (but that is something for another post).

3. Borg

If we just simply keep on copying our base images to a backup destination, we will soon end up with an expensive waste of storage space.

Let’s replace the cp command of our backup with borg backup. We’ll still be doing full image backups, but they will be deduplicated, compressed and encrypted. Providing the same storage efficiency as incremental snapshots. Now, if we backup all our vms to the same borg repository we can get realy good data reduction, since we can profit from the deduplication repo wide.

Once the snapshot has been created we can run borg to backup the disk image. Check borg quickstart & documentation for more information on how borg works. It comes down to the following steps.

3.1 Setup a new repository

Initialize a new repository with the following command. You will be prompted to set a password. Use a stong password.

borg init /path/to/repo

3.2 Creat backup

Create a backup of our disk image:

borg create -v --stats /path/to/repo::'{now}' /path/to/vm_disk.qcow2

3.3 Prune old backups

Prune older backups like so:

borg -v --list prune /path/to/repo --keep-daily=7   \
                                   --keep-weekly=4  \
                                   --keep-monthly=6

3.4 Prune deleted vms

4 Snapshot script

Snapshot script github script

5. Rsnapshot

5.1 Pull vm database dumps

rsnapshot config

5.2 Borg

backup database dumps

6. Systemd

6.1 Automation

We’ll use systemd .service units to run our backup scripts and systemd timers to automate everything.

6.1.1 Service files

rsnapshot

vm-borg-backup@.service

kvm-backup.service

chmod 600 .service files, they contain sensitive information.

6.1.2 Timers

rsnapshot

kvm-backup

7. Tying it all together

8. Useful resourses

comments powered by Disqus