If you didn’t have the opportunity to attend KubeCon/CloudNativeCon Europe in Valencia earlier this year, you’ve come to the right place as I am about to outline the main takeaways and trends in this blog:
- Kubernetes is a generic control plane for everything cloud-native
- Platform engineering is a new trend to improve DevX,
- Chaos engineering is shifting left towards development,
- linkerd is the service mesh darling and
- OpenTelemetry is almost ready for production,
- And a few more…
I will include links to all relevant talks, should you want to watch them for yourself!
One Kubernetes Control Plane to rule them all
Kubernetes is an awesome container orchestrator, but it turns out you can use the Kubernetes control plane for much more than running containers in clusters!
So what is a control plane? Essentially, a set of applications that are usually exposed through an API that we can tell what we want, not how to do it.
Kubernetes is an example of a control plane: we tell Kubernetes what applications we want to run, and Kubernetes takes care of the actual running of the applications. Another example could be public cloud providers: we tell the AWS/Azure API to create a virtual machine with a specific configuration, and the machine is created for us.
Kubernetes works on objects using the Kubernetes Resource Model (KRM), which enables Kubernetes to continuously reconcile the desired state of the objects. This means we can use the same underlying control plane to control any number of interesting things, using the same reconciling loop, by expressing them as Kubernetes resources.
This could be things running in the cluster, like helm charts, external cloud resources like S3 buckets or managed databases, or even other Kubernetes clusters! Usually, you would deploy a "control cluster", a Kubernetes cluster responsible for controlling other resources and not running the actual workloads.
Several new tools leverage this idea of Kubernetes as a control plane, but the most exciting of these is Crossplane. Crossplane represents external resources as Kubernetes resources. We can use Crossplane to declare the external resources we need and use the Crossplane Custom Resource Definitions (CRDs) to utilize the standard Kubernetes control loop to reconcile the state of the resources.
We are only starting to see the beginning of what is possible with control planes and using Kubernetes for other things than orchestrating containers, but I cannot wait to see more out in the wild.
Talks to watch:
- Crossplane Intro and Deep Dive
- Building Digital Twins for DFDS with Crossplane and Kubernetes: a very cool joint presentation by DFDS and Upbound on how they are leveraging Crossplane to create “digital twins” (digital representations of ships/containers/terminals/etc.) in Kubernetes.
Platform Engineering – a way to reduce tech overload
Another hot topic at KubeCon was "platform engineering", which can also be referred to as "DevX", "golden paths", "paved roads", or "capability team". I will call it platform engineering (PE) to keep things clear.
Essentially the idea of platform engineering is to make it easy for developers to utilize all of the available tools and platforms without having to be experts at managing the underlying infrastructure.
For example, we want developers to deploy to Kubernetes, but we don't necessarily need all of them also to be experts in Kubernetes administration. In other words, we want to create a good experience for developers or "DevX" (Developer Experience Design).
We do this by creating friendly interfaces for developers to consume resources. It could be a web portal, where a developer could create an environment by inputting a few parameters, like the name, resource allocation, and how long the environment should exist before being garbage collected. This is then translated to actual resources by a control plane managed by the Platform Engineering team.
Think of this as an abstraction similar to how a Kubernetes administrator would create persistent volumes, and then developers would consume those with persistent volume claims.
This way, we get a separation of concerns:
- Operations supplies and manages the foundational infrastructure.
- The Platform Engineering Team creates interfaces to make resources easy to consume.
- Developers utilize interfaces created by platform engineering to experiment and create value.
Crossplane is also interesting in the context of platform engineering. It provides a way of implementing this separation of concerns: the platform engineering team creates composite resources, and developers consume these by creating claims (instances).
Looking at all of this from a DevOps perspective, I can’t help but think whether we are actually taking a step forward and two backward – aren’t we just recreating the silos and walls we just tore down by separating Dev / PE / Ops like this? It will be very interesting to see how platform engineering will make an impact. I’d love to hear your thoughts on the topic!
Talks to watch:
- From Kubernetes to PaaS to … Err, What’s Next?
- From Cloud Native to Cloud Native – Avoiding Mistakes Everyone Does
- Lightning Talk: How Intuit Enables GitOps at Scale For All Its Developers
- I also like this video by Viktor Farcic (Developer Advocate for Upbound), in which he explains control planes and Crossplane with an example of a Platform Engineering team creating a composite cloud database resource, and developers then consuming it.
Resiliency & chaos engineering is still on the rise
If resiliency engineering is making systems resilient to failures, then “chaos” engineering is the practice of “breaking things proactively”.
As we decompose our systems into smaller services, it becomes increasingly important that our systems are resilient, especially as services in production will have unpredictable life cycles (updates, scaling, downtime, etc.). We want to shift resiliency left, from operations (chaos monkey) into the development process (chaos testing).
So far, chaos engineering has focused primarily on operations, with tools like chaos monkeys that stop services randomly in an environment to force the team to create resilient applications. This is great for verifying that our resilient systems really are resilient, even after we have deployed them – but we want to make this practice proactive and predictable.
Therefore we need “chaos testing” to define scenarios and run them against our applications as we develop them. A scenario could be like this: send X amount of requests over Y amount of time, kill one of the services at Z interval and then see how many of the requests are served successfully.
This is great because it lets developers test how the application behaves when it loses connection to other services and helps them make changes to the code that actually make a difference to the application's resiliency.
Further, these tests can be run from a Continuous Integration pipeline, providing auditable proof that a given commit increased or decreased the application's resiliency.
Litmus is an incubating CNCF project that provides a platform for conducting chaos tests.
Talks to watch:
- Case Study: Bringing Chaos Engineering to the Cloud Native Developers
- Reproducing Production Issues in your CI Pipeline Using eBPF
Linkerd is the Cloud Native service mesh darling
There was a lot of buzz around service meshes and especially linkerd. Service meshes provide advanced networking functionality, such as weight-based routing (for canary deployments), secure communication using MTLS (mutual TLS encryption of traffic between services), and observability of the connections between services.
Linkerd is a CNCF graduated service mesh that is fast, simple, and, most importantly, provides you with out-of-the-box MTLS for your services. This is important because it allows your services to communicate securely with TLS encryption internally in the cluster.
The consensus these days seems to be that service meshes are becoming a commodity; at the linkerd panel discussion, one of the panelists was asked, “What is your advice for people considering a service mesh?” and they simply replied: “Well, just do it!”.
Curiously, I didn’t hear anyone talking about ISTIO, one of the other big service mesh projects. The exception was a talk from Microsoft on how they secured the Xbox cloud gaming platform with linkerd (linked below). They Chose linkerd after evaluating ISTIO and other service meshes.
So the takeaway seems to be: if you are looking for a service mesh, look at linkerd. If you are not looking for a service mesh, you should probably still look at linkerd.
Talks to watch:
OpenTelemetry is (mostly) ready for production
Application telemetry is hard. We want good observability of different signals: metrics, logs, and traces. That means we usually end up with several different systems for capturing signals and processing them for useful outputs such as dashboards, alerts, etc.
But all of these different new systems (and their management) add a lot of complexity to our applications.
Therefore, the OpenTelemetry project aims to create a framework for observability of cloud-native applications. In practice, this consists of a unified protocol, “OTLP”, for tying together clients (that emit signals) and collectors (that process signals) with a variety of analysis tools.
OpenTelemetry uses standard protocols for collecting signals, like Prometheus exporters, Jaeger, and others. This means your existing observability setup is likely to be compatible with OpenTelemetry.
OpenTelemetry is a CNCF incubating project and is, at the time of writing, the second most active CNCF project, only second to Kubernetes! Currently, traces are stable. For metrics, the current release candidate is expected to move to stable within the next few weeks. Logs are still considered experimental.
It looks like the cloud-native community has latched on to the project, and I believe that within the next year or two, the project will mature to the point that it is production-ready for all of the signals, quickly becoming the standard.
Talks to watch:
Some new features to check out!
Debugging in Kubernetes with ephemeral containers
While the terminology overload is annoying (aren’t all k8s containers ephemeral?), the new ephemeral containers feature is very exciting. It allows for much easier low-level debugging of applications running in Kubernetes.
Ephemeral containers allow us to create new containers in existing pods – but these new containers can run any image we need, for example images with tools for debugging, and can have access to the same Linux namespaces as the other container!
This means that we can do much lower-level debugging in running applications and have the right tools available for debugging without having to include them in our application images. While the “$ kubectl debug” feature is also great, it will create a new pod, which is not always what we need; for example, if we need access to the same processes, we can access these with an ephemeral container.
Since we can run any container image we want, this also means that it is much easier to debug single binary images or distro-less images, as we can have all of the debug tools (and even a shell!) we need on-demand, but without shipping them as part of our application.
Talks to watch:
Krew: kubectl plugin package manager
Kubectl now has a plugin manager: Krew. It allows you to easily install plugins for kubectl, adding new functionality. There are currently about 200 plugins available, and more are being added. I spotted “$ kubectl tree” at KubeCon and have played around with it, and it’s pretty cool for visualizing the hierarchy of Kubernetes resources.
All of the krew plugins are open source, so as with all other open source code, verify that the code is safe before installing and running it.
Published: June 27, 2022