top of page

EKS Cluster Disaster Recovery Using Velero: Best Practices

Updated: Jul 12



a-hand-helping-another



Introduction

Velero is a robust tool for Kubernetes disaster recovery, enabling users to backup, migrate, and restore applications and persistent volumes. This section provides guidance on using Velero as a disaster recovery strategy within an Amazon EKS cluster.


Objectives

The primary objectives of implementing Velero for disaster recovery are as follows:

  • Efficient Backup Strategies: Leverage Velero to create periodic backups of your EKS cluster resources, ensuring minimal data loss in case of a disaster.

  • Automated Scheduling: Utilize Velero schedules to automate the backup process, reducing manual intervention and ensuring regular snapshots.

  • Seamless Restore Operations: Develop clear restore strategies using Velero manifests, allowing for a quick and efficient recovery process.


Considerations

  • Backup Frequency: Determine an appropriate backup frequency based on the criticality of your applications and data.

  • Retention Policies: Define retention policies for your backups to manage storage costs effectively.


Backup and restore workflow

Velero consists of two components:

  • A Velero server pod that runs in your Amazon EKS cluster

  • A command-line client (Velero CLI) that runs locally

Whenever we issue a backup against an Amazon EKS cluster, Velero performs a backup of cluster resources in the following way:

  1. The Velero CLI makes a call to the Kubernetes API server to create a backup CRD object.

  2. The backup controller:

  3. Checks the scope of the backup CRD object, namely if we set filters.

  4. Queries the API server for the resources that need a backup.

  5. Compresses the retrieved Kubernetes objects into a .tar file and saves it in Amazon S3.


backup-resource-workflow


Similarly, whenever we issue a restore operation:

  1. The Velero CLI makes a call to Kubernetes API server to create a restore CRD that will restore from an existing backup.

  2. The restore controller:

  1. Validates the restored CRD object.

  2. Makes a call to Amazon S3 to retrieve backup files.

  3. Initiates restore operation.


restore-resource-workflow


Velero also performs backup and restore of any persistent volume in scope:

  1. If you are using Amazon Elastic Block Store (Amazon EBS), Velero will create Amazon EBS snapshots of persistent volumes in scope.

  2. For any other volume type (except hostPath), use Velero’s Restic integration to take file-level backups of the contents of your volumes. At the time of writing, Restic is in Beta, and therefore not recommended for production-grade backups.



contact-us-button


Steps


1. Velero Installation.

You can easily follow the official guide to the complete Velero installation. This guide also outlines the creation of the necessary resources to set up before configuring Velero.


If you want, you can make this installation using helm too, which is another way you choose.  (https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/values.yaml), Remember to create the AWS needed resources before this installation. 


2. Check resources creation.

After the successful installation and configuration, we can check the successful creation of all resources (IAM Role, S3 bucket) and the Velero pod running correctly.



successful-resource-creation-1

successful-resource-creation-2


Below is a list with all the available verbs of Velero.



available-verbs-of-velero


3. Schedule Backups.

Create a Velero schedule manifest (schedule.yaml) to define the backup frequency and included namespaces. Example:

# File: schedule.yaml

apiVersion: velero.io/v1

kind: Schedule

metadata:

 name: daily-backup

 namespace: velero

spec:

 schedule: "@daily" or CronJob expressions

 template:

   includedNamespaces:

   - namespace1

   - namespace2

   snapshotVolumes: true


4. Restore from Backup.

In the event of a disaster, use a Velero restore manifest (restore.yaml) to initiate the recovery process. Example:

# File: restore.yaml

apiVersion: velero.io/v1

kind: Restore

metadata:

 name: restore-from-backup

 namespace: velero

spec:

 backupName: daily-backup-2023-08-01T02:00:00Z

 restorePVs: true

 includedNamespaces:

 - namespace1

 - namespace2


5. Validation.

Regularly validate your disaster recovery strategy by simulating restore operations in a non-production environment.



martin-carletti


Martín Carletti

Cloud Engineer

Teracloud





fabricio-blas



Fabricio Blas

Cloud Engineer

Teracloud





To learn more about cloud computing, visit our blog for first-hand insights from our team. If you need an AWS-certified team to deploy, scale, or provision your IT resources to the cloud seamlessly, send us a message here.

Comments


Buscar por tags
bottom of page