This article explores some of the many key factors to take into account when devising your Kubernetes clustering strategy.
We suggest several key questions that can help you assess your application’s resilience needs and determine the required workload isolation. Additionally, we provide a brief overview of Multi-Tenancy.
Finally, we draw attention to a critical consideration that is often overlooked: the potential service risks posed by the default single control plane model deployed by Kubernetes, particularly when hosting applications with their own high availability designs.
How many Kubernetes clusters do you need & what topology strategy should you adopt?
To answer, one might start by conducting a thorough assessment of your application’s resilience requirements, including its ability to provide reliable services at scale. This assessment should take into account factors such as the level of traffic, the degree of fault tolerance required, and the potential impact of downtime on your business.
The mind map below, details some of the many factors to take into consideration when planning and migrating enterprise applications to a Kubernetes platform.
Qualifying Questions
Ponder for a moment, two fundamental questions which should help guide you in a direction towards the right cluster strategy for your workloads and requirements:
Q1. Does my application need to span multiple data centres, availability zones, or even regions to support high availability & scale?
Q2. Is my application mission-critical, providing 5x9s resilience and millisecond recovery or will recovery within a few minutes suffice?
Other factors that you may need to be take into account before deciding on a cluster strategy might be:
- Cost implications
- Shared Tenancy
- Geographical distribution of users
- Regulatory requirements
- Security considerations
- Application delivery strategies
- Resilience requirements
- Workload isolation levels
High volume, mission critical applications such as trading platforms will typically have much greater application resilience requirements than, say, a web service is likely to have. These factors and others, will influence the solutions you’ll adopt and your overall cluster strategy.
Another major question you might like to ask is:
Q3. How much workload isolation do your applications need and why?
With the right cluster strategy you can minimise the “blast radius” from cluster level failures resulting from operations such as Kubernetes upgrades, configurational changes and security breaches.
Multi-tenancy
Cost savings are often overlooked, particularly when dealing with cloud resources.
Kubernetes can help you here with overcommit mechanisms and vertical autoscaling to manage compute density. With careful planning and the right cost monitoring software, substantial savings can be realised. Another way Kubernetes helps organisations save on costs and complexities is multi-tenancy. This is where a number of different environments or workloads may be deployed on a single Kubernetes cluster where the resources are divided amongst the cluster tenants.
Lastly, it’s good practice to separate dev and staging from your prod environments on separate clusters to reduce the risk of being served beta or non-production code versions. If cost is an issue, again, you can choose a multi-tenancy configuration.
All these factors need careful consideration when formulating your Kubernetes cluster strategies.
Workload Isolation
High availability systems rely on duplication to mitigate impacts from critical component failure. One way to achieve greater HA service resilience is by increasing workload isolation.
Q4. So how can you isolate workloads in Kubernetes?
Before Kubernetes, the traditional approach was to deploy applications onto separate hardware. Kubernetes offers very granular and robust Kernel level multi-tenancy capabilities which allow separate workloads to co-exist on the same cluster. If however, you need complete control over your clusters for such things like maintenance windows and platform upgrades, you might opt for cluster isolation and separate workloads along organisational lines such as development teams, or along application demarcations.
We’ve mentioned the following isolation options:
- Hardware
- Multi-Tenancy
There is one additional isolation you should consider, and that is:
- Control Plane
The first two can be achieved with any single-cluster topology by using Kubernetes native tools such as ‘Resource Quotas’ and ‘Namespace’ functions.
To implement Control Plane isolation however, you’ll be stepping into the realms of multi-cluster topologies which we’ll discuss in the next part of this article.
Note: A Control Plane is how Kubernetes groups a bunch of servers together into a cluster of servers for control purposes. All cluster and application related settings, including the control API for the cluster, reside on separate hardware from the rest of a cluster’s worker servers. The Control Plane is a non optional and integral part of every Kubernetes platform. For more information, see here.
Q5. Why do you need Control Plane isolation?
Kubernetes out-of-the-box, only supports a single-cluster model.
Your applications are typically deployed onto, and controlled by a single cluster, although you can have many of them. Figure 2 illustrates that while Kubernetes clusters typically have multiple member servers or “nodes” per cluster (up to 500), each cluster has a single Control Plane which holds all the configuration for that cluster and the application workloads it runs. Each cluster is managed separately by an operator or process making API calls to its control plane.
All objects and names within a Kubernetes cluster are scoped to within the context of the cluster boundary and must be unique.
“So what’s wrong with that?”, you might say.
Well, it’s about how resilient you want your clusters to be.
Consider a scenario where you have your production workloads on a single cluster and during a cluster upgrade, you encounter a problem. This can seriously compromise your cluster’s ability to provide your service.
Going with one of the Cloud Provider Kubernetes SaaS distributions doesn’t help with this much either. While they’ll take care of making your Control Plane highly available, this won’t make it anymore resilient to cluster level faults. You’ll still need to contend with the fact that a single cluster equates to a single control plane which may not be enough for your requirements.
There are many other reasons you might want to consider a multi-cluster topology besides the Control Plane limitation and we’ll go over these in Part 2 of Kubernetes Cluster Strategy: Single or Multi?.
IMPORTANT: Control planes offer no inherent domain level redundancy. This means that a cluster level fault such as a mis-configuration, security breach or failed upgrade, for example, could leave your entire cluster in a non working state.
Service Risks
A cluster level fault has the potential to adversely affect the following Kubernetes functions:
- Pod scheduling
- Pod fault detection & correction
- Application rollouts
- Cluster configuration
- Kubernetes upgrades
In Part 2 of Kubernetes Cluster Strategy: Single or Multi?, we’ll take a look at what cluster strategies you can adopt to mitigate these risks. We’ll compare the benefits of single vs multi-cluster topologies and introduce a couple of different “application level” clustering architectures. We’ll go over the many benefits of a multi-cluster topology and some of their problems. If you run enterprise applications and need to operate at large scale or require strict isolation between workloads, you don’t want to miss Part-2.
About Devoteam A Cloud
With 500 clients across Europe, Devoteam A Cloud offers excellent know-how on AWS technologies since 2012. Our team of 550+ AWS experts supports customers with scalable infrastructure, new ways of thinking and operating enabled by AWS so that they can explore new possibilities, re-invent their business, and evolve into an enterprise platform.
Devoteam A Cloud is AWS Premier APN Consulting Partner, with 4 competencies: DevOps, Data & Analytics, Security, Migration. In 2021, it was awarded AWS APN Migration Partner France, following 2020, where Devoteam was awarded AWS APN Consulting Partner of the Year.
At Devoteam we specialise in this type of scale complexity. By combining the latest skillsets in Cloud technologies with decades of experience working within enterprise businesses, we can help businesses build out these capabilities internally.
Interested in learning more about cloud computing, cybersecurity, or other hot technology topics? Check out our website for expert insights and advice.
Tony Barganski
Principal Kubernetes Consultant
Devoteam UK