Containers are breaking backups around the world, but there are steps you can take to make sure that the most critical parts of your container infrastructure are protected against the worst things that can happen to your data center.
At first glance it may seem that containers don’t need to be backed up, but on closer inspection, it does make sense in order to protect against catastrophic events and for other, less disastrous eventualities.
Containers are another type of virtualization, and Docker is the most popular container platform. Containers are a specialized environment in which you can run a particular application. One way to think of them is like lightweight virtual machines. Where each VM in a hypervisor server contains an entire copy of an operating system, containers share the underlying operating system, and each of them contains only the required libraries needed by the application that will run in that container. As a result, many containers on a single node (a physical or virtual machine running an OS and the container runtime environment) take up far fewer resources than the same number of VMs.
Another difference between VMs and containers is that where companies tend to simultaneously run many applications in a single VM, containers are typically designed to each serve a single application component that typically does a single task such as logging or monitoring. If multiple application components need to interact, each will typically run in its own container and communicate across the network. This allows for individual scaling of each application and provides some fault and security isolation between applications.
Where VMs are designed to run inside a particular hypervisor running on a particular set of hardware, containers are much more portable. Containers are designed to run on virtually any Linux system, and can even run on Windows if the appropriate software has been installed. Finally, containers are designed to be much more temporary than VMs. Where a typical VM might run for months or even years, 95% of all containers live for less than a week, according to a recent Sysdig survey.
Running a lot of containers in a production environment requires orchestration, and that’s where Kubernetes (often spelled K8s) comes in. It groups containers into pods, which are one more containers accomplishing a single purpose. Containers in a pod can easily communicate with each other and can share storage by mounting a shared volume.
How containers break backups
Historically backups were accomplished by placing an agent in a server that needed to be backed up. Virtualization broke that model, so a different model was created where the agent runs at the hypervisor level and backs up the VMs as images. Containers offer neither of these options.
While you could theoretically place an agent inside a container image, that is considered very bad form for many reasons, so no one does that. In addition, there is currently no way to run an agent at the container runtime layer, which is analogous to the hypervisor level. Finally, the idea of backing up containers seems rather foreign to many who use them. Think about it; most containers live for less than a week.
Why containers need backing up
In one sense, a typical container does not need to have its running state backed up; it is not unique enough to warrant such an operation. Furthermore, most containers are stateless – there is no data stored in the container. It’s just another running instance of a given container image that is already saved via some other operation.
Many container advocates are quick to point out that high availability is built into every part of the container infrastructure. Kubernetes is always run in a cluster. Containers are always spawned and killed off as needed. Unfortunately, many confuse this high availability with the ability to recover from a disaster.
To change the conversation, ask someone how they would replicate their entire Kubernetes and Docker environment should something take out their entire cluster, container nodes and associated persistent storage. Yes, there are reasons Kubernetes, Docker and associated applications need to backed up.
First, to recover from disasters. What do you do if the worst happens? Second, to replicate the environment as when moving from a test/dev environment to production, or from production to staging before an upgrade. And third, to migrate a Kubernetes cluster more easily.
What would you need in a disaster?
There are several things you would need to replicate an entire environment in case of disaster:
Container images – A container image is a static file that contains all the executable code necessary for a container to run. Container images do not change; they are what is used to run a given container. If changes need to be made to the libraries and code for a given container, a new image would be created for that container. Container images need to be protected in some way, often using a repository for such things. In turn, that repository should be protected against disasters.
Attached storage, databases – Containers often create data that outlives the life of the container. To accomplish that, mount a volume via NFS, an object store or similar mechanism, and write data to that volume. It may also make a connection to a database.
Persistent volumes – Kubernetes pods are increasingly using persistent storage. That data should also get backed up if the data stored on it is valuable to the business.
Deployments – A deployment is a Kubernetes concept of a set of pods accomplishing a particular function. Deployments are stored as YAML files that need to be backed up.
Kubernetes etcd – The Kubernetes central database is etcd, and it needs to be backed up. It’s relatively small, and K8s provides tools to dump its contents to a file that you can then back up.
Prometheus – Prometheus is often used to monitor K8s and Docker. It’s configuration should also be backed up.
Kubernetes resources – As developers create resources in K8s, those resources need to be backed up with the right group and version.
What shouldn’t need backup?
Not everything needs to be backed up. For example:
Running stateless containers – A running container is temporary. It was spawned from an image – which needs to be backed up – but the running instance of the container does not need to be backed up. Any data it creates should probably be backed up, but if the container itself needs to be backed up, something is wrong. If a container actually contains data, as opposed to storing it on an external volume, then it would need to backed up – but that should be very rare.
Pods – Since pods are simply groups of running containers, they also do not need to be backed up.
Each entity mentioned above offers a native tool that can be used to back up that entity to local or remote storage. There are also commercial utilities starting to come on the market that run in a variety of ways. The next article will cover these various methods in detail, including how to use them to restore the various parts of your Kubernetes and Docker environment.