The Art of the Blueprint: Achieving Data Consistency with Kasten K10 and Kanister.io

Michael Courcy

3 years ago

Kasten K10 offers a simple and reliable solution to backup your namespaces with a single click: both the data and metadata are captured and protected. How does Kasten K10 accomplish this feat? By leveraging the snapshot capacity of the storage provider to create snapshots of your PVC and making them portable, to enable long-term protection and migration.

Sounds great, right? It is, except when there are cases for which this approach has some shortcomings. Let’s examine these scenarios and how our open source project, Kanister, creates a cloud native bridge to solve advanced problems.

Understanding Scenarios that Don’t Support the Storage Snapshot Approach

Sometimes, persistent data resides only in memory; for instance, the Couchbase in memory-only database. In other cases, the database may not support the “storage snapshot” approach, for example, the Elasticsearch database which supports only the backup API.

Occasionally, a part of the data is outside of the Kubernetes infrastructure. For instance, your application databases is on AWS RDS or Google Cloud SQL, but you need to capture the database backup with your application backup. Or, what if the data needs to be flushed from memory to storage before you can create the snapshot? For example, the function fsyncLock in MongoDB makes sure all the transaction is flushed onto the filesystem and the database is read-only until you finish your snapshot and call fsyncUnlock.

There are other instances for which leveraging the snapshot capacity of the storage provider to create and export PVC snapshots is not a viable approach. You may want to create a logical dump of the database rather than a storage snapshot to increase portability and control granularity, so you can satisfy backup requirements or ease database restoral to a new database instance. Or, you may have operational reasons to create specific actions when backing up your application, such as sending an email or invoking a webhook, updating DNS, changing a load balancer state, etc.

While this list is not exhaustive, it helps us to understand that the backup and restoral process should be extensible.

Enter Kanister

Kanister is an extensible open-source framework for application-level data management on Kubernetes. In other words, it’s a framework that can help you execute your data operations and capture them in a blueprint, so that they can be reused and parameterized.

Kasten can be extended with Kanister in many ways, but most of the time, this is accomplished by annotating a Kubernetes object with the name of a blueprint.

In a recent whitepaper, “Conquering Data Consistency with Kasten K10 by Veeam and Kanister.io Blueprints,” Michael Courcy, Solutions Architect at Kasten by Veeam, takes a deep dive into the art of writing blueprints and how to integrate them with Kasten K10. Elasticsearch is a good candidate for the tutorial, for the following reasons:

Elasticsearch does not support backup by snapshotting the PVCs.
Elasticsearch is effective for partial backup, because most of the time, it’s not desirable to back up all the indices.
Elasticsearch supports horizontal scalability and can become huge – up to many terabytes of data – so backup must be incremental.
Elasticsearch can be deployed and maintained through a Kubernetes operator, requiring consideration when performing a restore operation.

Download the whitepaper to jump into the art of the blueprint, and become a blueprint expert for Kasten K10!

New to Kasten K10? Try it for free today!