Getting Kubernetes up and running can be challenging, and that’s before we even start to talk about security. That’s why most people go for managed Kubernetes services in the cloud. In this article, we’ll compare the 3 biggest managed Kubernetes solutions: EKS, GKE and AKS.
What can you expect from managed Kubernetes?
Stability, resilience and security are pretty hard to achieve in a Kubernetes cluster. While it’s perfectly fine if a worker node fails knowing most of its workload is redundant, the Kubernetes master should be rock solid. If that fails, your whole cluster will be down.
This is where managed Kubernetes services step in. Depending on the cloud provider, one can get a more or less highly available Kubernetes master with an SLA.
Another challenge of Kubernetes is that there is no user database included. This problem is also usually solved by the cloud provider by integrating Kubernetes Auth & RBAC with the cloud provider’s IAM.
Will managed Kubernetes solve all my problems?
For all the advantages of managed Kubernetes it still doesn't leave its users completely carefree. Unless you’re running Kubernetes serverless (without worker nodes, in favour of Fargate / CloudRun), you still have to take care of scaling, monitoring and maintaining your worker nodes. Security also needs to be taken care of. But, let’s turn our attention to EKS, GKE and AKS, and see how they compare.
Comparing managed Kubernetes
As mentioned before, having a rock solid master is the key to a resilient Kubernetes cluster. While Amazon's EKS is highly available by default, GCP lets you choose if you want to span the Kubernetes master across different AZs. AKS didn’t offer multi-AZ masters until quite recently. All three providers have SLAs for their master’s uptime.
While AWS and Azure not long ago introduced managed worker groups, GCP has had this feature from its inception. When AKS released managed worker groups last year our teams found quite a lot of problems with it. For example, using AKS it was not possible to mix different node pools with different instance sizes and worker nodes didn’t restart after updating. EKS and especially GKE perform these functions much more smoothly.
Worker IAM Integration
In order to use a Kubernetes cluster with other resources in the cloud you have to be able to assign permissions to your cluster resources. While GCP has a pretty good integration of IAM to GKE, EKS has long struggled with this issue. Community projects as kube2iam have arisen to fill the gaps. Luckily Amazon fixed this problem last autumn and introduced fine-grained IAM roles for service accounts. AKS doesn't have granular permission assignment at all and only allows assignment of service principles on cluster level.
GKE leads the field for integration of ingress. While both AWS and GCP offer integration of the level 7 load balancer with Kubernetes ingress, the setup in GCP is much more smooth. On the other hand AKS doesn't have a feature like this at all.
If you want to get a free Kubernetes master, one zonal GKE or AKS are the way to go. If you want to get an HA cluster on GCP or AWS you have to pay 0,10 € (GCP) or 0,20 € (AWS) per hour of operation. For worker nodes all cloud providers will charge you the normal VM cost.
Ease of use
When evaluating general ease of use GKE is again the clear victor. Less than 100 lines of Terraform code gets a whole cluster up and running, and a lot of the maintenance work can be done automatically. AKS, after it introduced its managed worker nodes, is quite easy to set up. Nonetheless it’s difficult to run and maintain (Worker nodes need manual restart after updating, mixed worker groups cause problems, and some users report random system crashes). Those used to working with AWS will probably already have guessed, that their EKS cluster requires a lot of knowledge, and it won’t always be well covered by their official documentation.
In recent times the teams behind new software projects have had to decide whether they want to go serverless or for containers. While serverless has the clear advantage for users not needing to worry about maintenance of virtual servers, they have considerable disadvantages, including lower speeds and vendor lock-in. This has stopped a lot of people from using the new technology. After GCP released Cloud Run, a managed Knative service in November 2019, a month later AWS took it one step further and integrated Fargate worker groups to EKS. While GKE clusters currently only allow you to run Knative services serverless, with Fargate on EKS it’s possible to run any pod without having to worry about worker nodes.
Being only available in 3 regions, Fargate for EKS is a relatively new and quite immature feature. In our test run we had a few problems with pods not being scheduled, but we’d expect AWS to fix these problems very soon.
EKS seems to be the clear winner when it comes to serverless Kubernetes, however GCP is the leader with regards to traditional worker nodes. Updating traditional worker nodes can be difficult on AKS and even EKS, but GKE lets you automatically update your worker groups without any downtime.
Considering the release timelines as well as the experiences of our consultants, GKE not only offers the most features, but those features have been available for the longest time and provide the most mature solutions. A lot of issues have been reported last year for AKS, the most immature of the three solutions. EKS is somewhere in the middle, but far more stable than AKS.
For me, GKE is the all around winner. A cluster setup in the GCP console can be done in less than 10 min (or with ~ 100 lines of Terraform code), it’s rock solid and updates itself automatically. Nonetheless for some users AWS would be a good choice, because of their myriad cloud features and leadership in the serverless movement. Once Fargate for EKS is more stable and available it could give them a competitive advantage.
Google and Amazon are closely matched in the battle for the best Kubernetes solution on the market and I would recommend them for the vast majority of users. For companies that already have a lot of services running on Azure and need Kubernetes with a lot of Windows pods, AKS might also be a reasonable choice.
Providers constantly add and alter the features of their services, and this article can only give a snapshot of the current situation in July 2020. Who knows where we will be a year from now.