Sneak peak at CSI Volume snapshotting Alpha feature

In this blog I will show you how to create snapshots of Persistent volumes in Kubernetes clusters and restore them again by only talking to the api server. This can be useful for either backups or when scaling stateful applications that need “startup data”.

The snapshot feature was introduced as Alpha in Kubernetes v1.12. So, for this to work, you need to enable the VolumeSnapshotDataSource feature gate on your Kubernetes cluster API server.

--feature-gates=VolumeSnapshotDataSource=true

I will be using Rook to provision my storage as they support layered filesystems and the CSI driver.

I assume you have an application up and running in your cluster. In my case, I have Jira Software running in Data Center mode with one active node provisioned with ASK.

In order to scale horizontally, I need a copy of Node0 home folder before I can start Node1. So, we start by defining some objects in Kubernetes.

Creating the StorageClass

When you create your StorageClass for Rook, you need to add imageFeatures and set it to layering as shown below:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
clusterID: rook-ceph
# Ceph pool into which the RBD image shall be created
pool: replicapool

# RBD image format. Defaults to "2".
imageFormat: "2"

# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering

# The secrets contain Ceph admin credentials.
csi.storage.k8s.io/provisioner-secret-name: rook-ceph-csi
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-ceph-csi
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`.
csi.storage.k8s.io/fstype: xfs

# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete

When we deploy Jira with ASK, we simply use this storageclass, and Rook will create the storage when needed.

So, now we have a PVC for the home folder and one for the Data Center volume.

The Data Center volume is out of scope for this blogpost, as it’s not a block storage but a shared filesystem (Read Write Many) in Rook.

Creating the VolumeSnapshotClass and your first Snapshot

Now we define a VolumeSnapshotClass to handle our snapshots

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshotClass
metadata:
name: csi-rbdplugin-snapclass
snapshotter: rook-ceph.rbd.csi.ceph.com
parameters:
# Specify a string that identifies your cluster. Ceph CSI supports any
# unique string. When Ceph CSI is deployed by Rook use the Rook namespace,
# for example "rook-ceph".
clusterID: rook-ceph
csi.storage.k8s.io/snapshotter-secret-name: rook-ceph-csi
csi.storage.k8s.io/snapshotter-secret-namespace: rook-ceph

And then we are ready to create snapshots of the source PVC, in this case jira-persistent-storage-jira-0.

apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
name: rbd-pvc-snapshot
spec:
snapshotClassName: csi-rbdplugin-snapclass
source:
name: jira-persistent-storage-jira-0
kind: PersistentVolumeClaim

This will give us a volumesnapshots, as seen here:

kubectl get volumesnapshots -n jira-production
NAME AGE
rbd-pvc-snapshot 57m

Creating a new PVC from our snapshot

Now, if we want to create a new PVC based on this VolumeSnapshots, we define it like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jira-persistent-storage-jira-1
spec:
storageClassName: rook-ceph-block
dataSource:
name: rbd-pvc-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
 

Now we have a second PVC called jira-persistent-storage-jira-1, based on the PVC jira-persistent-storage-jira-0 with all its data from that point. So now we can scale our statefulset Jira, and the new Jira node1 will use this PVC which is a copy of Node0.

NAME                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
jira-datacenter-pvc Bound pvc-a286df18-f9a3-4c52-b0a0-377a193f04de 5Gi RWX atlassian-dc-cephfs 93m
jira-persistent-storage-jira-0 Bound pvc-b73b96d6-c3f7-4448-9f12-d9956efe2989 5Gi RWO rook-ceph-block 93m
jira-persistent-storage-jira-1 Bound pvc-1984c9de-d13e-435e-b59c-28731d8f30bc 5Gi RWO rook-ceph-block 60m

Verification

We can verify it by looking at the mountpoint inside the container, once it has started up. The reason why the cluster.properties has a different timestamp, is because our entrypoint script makes changes to it, before starting Jira.

$ kubectl exec -ti jira-0 -n jira-production -- ls -l /var/atlassian/application-data/jira/
total 12
drwxrws---. 4 jira jira 46 Aug 22 13:43 caches
-rw-rw-r--. 1 jira jira 633 Aug 22 13:42 cluster.properties
-rw-rw----. 1 jira jira 1102 Aug 22 13:30 dbconfig.xml
drwxr-s---. 2 jira jira 4096 Aug 22 13:58 localq
drwxrws---. 2 jira jira 132 Aug 22 14:01 log
drwxrws---. 2 jira jira 76 Aug 22 13:32 monitor
drwxrws---. 6 jira jira 100 Aug 22 13:31 plugins
drwxrws---. 3 jira jira 26 Aug 22 13:24 tmp

$ kubectl exec -ti jira-1 -n jira-production -- ls -l /var/atlassian/application-data/jira/
total 12
drwxrws---. 4 jira jira 46 Aug 22 13:43 caches
-rw-rw-r--. 1 jira jira 633 Aug 22 13:57 cluster.properties
-rw-rw----. 1 jira jira 1102 Aug 22 13:30 dbconfig.xml
drwxr-s---. 2 jira jira 4096 Aug 22 13:58 localq
drwxrws---. 2 jira jira 100 Aug 22 13:32 log
drwxrws---. 2 jira jira 76 Aug 22 13:32 monitor
drwxrws---. 6 jira jira 100 Aug 22 13:31 plugins
drwxrws---. 3 jira jira 26 Aug 22 13:24 tmp

We can also see that we now have a VolumeSnapshotContent object in our cluster

$ kubectl get VolumeSnapshotContent
NAME AGE
snapcontent-05166c28-cdf9-4504-89c8-29c67ee23c11 73m

$ kubectl describe VolumeSnapshotContent snapcontent-05166c28-cdf9-4504-89c8-29c67ee23c11
Name: snapcontent-05166c28-cdf9-4504-89c8-29c67ee23c11
Namespace:
Labels: <none>
Annotations: <none>
API Version: snapshot.storage.k8s.io/v1alpha1
Kind: VolumeSnapshotContent
Metadata:
Creation Timestamp: 2019-08-22T11:56:35Z
Finalizers:
snapshot.storage.kubernetes.io/volumesnapshotcontent-protection
Generation: 1
Resource Version: 176903
Self Link: /apis/snapshot.storage.k8s.io/v1alpha1/volumesnapshotcontents/snapcontent-05166c28-cdf9-4504-89c8-29c67ee23c11
UID: 0a6afd6d-032d-4bf6-841d-a37146daf799
Spec:
Csi Volume Snapshot Source:
Creation Time: 1566474995000000000
Driver: rook-ceph.rbd.csi.ceph.com
Restore Size: 5368709120
Snapshot Handle: 0001-0009-rook-ceph-0000000000000003-e322e4c4-c4d3-11e9-afc8-0a580a2a0033
Deletion Policy: Delete
Persistent Volume Ref:
API Version: v1
Kind: PersistentVolume
Name: pvc-b73b96d6-c3f7-4448-9f12-d9956efe2989
Resource Version: 171532
UID: b8eed866-4e73-4a6a-bf74-d8fba8c9a8f5
Snapshot Class Name: csi-rbdplugin-snapclass
Volume Snapshot Ref:
API Version: snapshot.storage.k8s.io/v1alpha1
Kind: VolumeSnapshot
Name: rbd-pvc-snapshot
Namespace: jira-production
Resource Version: 176889
UID: 05166c28-cdf9-4504-89c8-29c67ee23c11
Events: <none>

Get Kubernetes to do it for you

So, what is this all good for? you ask? Well. So far we had to help Kubernetes each time we had to scale our Jira, Confluence or Bitbucket Data Center installation, as we needed to copy the data around. This could be automated with scripts, but now we can get Kubernetes to do it for us.

Although this is still in Alpha, and as of writing this blogpost, only supported by Block Storage by Rook but the developers told us that they are working on getting Shared Filesystem to be supported as well.

Also, we can now create snapshots as backups of our running applications. If we want, we can then start a backup pod that will mount this backup PVC and copy it outside the cluster to some cold backup location.

Published: Oct 7, 2019

Updated: Mar 26, 2024

Cloud