Baeldung Pro – Ops – NPI EA (cat = Baeldung on Ops)
announcement - icon

It's finally here:

>> The Road to Membership and Baeldung Pro.

Going into ads, no-ads reading, and bit about how Baeldung works if you're curious :)

Partner – Orkes – NPI EA (cat=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

In Kubernetes, effective resource management is critical for ensuring workloads receive fair access to resources while preventing resource starvation or overutilization. CPU requests and limits serve as essential tools to achieve this balance. However, the misuse of CPU limits can be counterproductive, as it causes ineffective resource utilization in the cluster.

In this tutorial, we’ll learn about CPU resources in the context of a Kubernetes cluster. Then, we’ll dive deeper into CPU limits and see how they can be counterproductive in certain scenarios.

2. Kubernetes CPU and Cores

A Central Processing Unit (CPU) is a computer’s primary computational component, responsible for executing instructions. A core, in a general sense, represents an independent processing unit within a CPU, capable of handling computational tasks simultaneously. Modern CPUs typically contain multiple cores, enabling parallel processing and improved performance.

2.1. CPU Measurements in Kubernetes

In Kubernetes, the term core takes on a slightly different meaning. In Kubernetes, core is a unit of measurement of a cluster’s CPU resource, and one core is equivalent to 1000 millicores. This granularity allows a finer specification of the CPU resource, such that we slice the CPU time more finely for our workload.

Importantly, Kubernetes designates CPU resources in terms of time instead of the actual computing power. Specifically, when we say that a workload would use at least 250 millicores, it means that the workload should be allowed to run for 25% of the time on a single CPU core within a scheduling period. Similarly, when a workload takes up two cores, it would take up 100% of the execution time of two CPU cores in any given time window.

In essence, the CPU cores measurement in a Kubernetes cluster estimates the share of the cluster’s processing time allocated to our workload, rather than directly representing the actual processing power.

2.2. CPU Requests and Limits

Kubernetes supports specifying the minimum and maximum CPU cores a workload can utilize using the requests and limit keywords. Concretely, we can define the requests and limits of the CPU resource for our pods through the pod specification:

apiVersion: v1
kind: Pod
metadata:
  name: banking-service
spec:
  containers:
    - name: banking-service
      image: banking:1.0.0
      resources:
        requests:
          cpu: "250m"
        limits:
          cpu: "500m"

In the example above, we create a pod resource banking-service that requires a minimum of 250 millicores and can use up to a maximum of 500 millicores.

2.3. CPU Requests Affect Pod Scheduling

The kube-scheduler is the Kubernetes process that decides where to place a pod. To make a decision, the kube-scheduler considers the amount of CPU requested by the pod and the CPU availability of the nodes. Concretely, the kube-scheduler only schedules the pod onto nodes with at least the CPU request value available.

For example, consider that there are two nodes in our cluster. The first node has 500 millicores remaining and the second node has two cores remaining. The kube-scheduler will schedule a pod with a CPU request value of 1500 millicores to node number two because the first node lacks the required 1000 millicores.

2.4. CPU Is a Compressible Resource

The CPU limit of our pods doesn’t affect pod scheduling, because the CPU, as a computational resource, is compressible. Contrast this to the memory resource, which can’t be compressed.

Workloads in Kubernetes can often tolerate the temporary reduction in CPU availability without complete failure. For example, consider a scenario where our workload, in steady-state, uses 500 millicores of CPU in the cluster. Then, a sudden surge in traffic causes the workload to demand more CPU time from the Kubernetes cluster. If the cluster can’t provide the additional demand, the workload will typically continue to run with worse performance.

On the other hand, lacking incompressible resources like memory will result in the workload being terminated. Hence, the kube-scheduler considers the memory limit when scheduling a pod, but not the CPU limit.

3. Kubernetes CPU Requests and Limit Enforcement

When the kube-scheduler creates a pod on a node, the underlying container runtime, such as containerd, creates the container. Under the hood, the cgroup in Linux defines two important parameters for controlling the container’s CPU access: cpu.shares and cpu.cfs_quota_us.

3.1. CPU Requests Through cpu.shares

The cpu.shares correspond to the CPU requests in our pod specification. Specifically, cpu.shares is a parameter that controls the relative CPU time a cgroup will get in a given window. For example, when there are two cgroups, each with a cpu.shares value of 100, each cgroup executes 50% of the total execution time on the system.

Critically, the values are relative to the sum of the cpu.shares value of all the cgroups in the system. When there are two cgroups on the system, and they have cpu.shares values of 200 and 100, respectively, the first cgroup will get 66.6% of the total CPU time on the system.

When the container runtime creates the container, it converts the CPU request value on our pod specification to the equivalent cpu.shares value on that particular node.

3.2. CPU Limits Through cpu.cfs_quota_us

The cpu.cfs_quota_us states the maximum allowable CPU time, in microseconds, that a cgroup gets to execute within a scheduling period. The scheduling period parameter is cpu.cfs_period_us, which has a default value of 1,000,000 microseconds. Notably, the cfs in both parameters refers to the Completely Fair Scheduler (CFS), which is the default Linux kernel scheduling algorithm up to version 6.6.

Similarly, the container runtime of the node will convert the CPU limits we define in terms of cores to the equivalent cpu.cfs_quota_us on the node.

4. Potential Inefficiency With Setting CPU Limits

In traditional resource management, setting hard limits might seem intuitive to safeguard cluster stability. However, the management of CPU resources requires a delicate thought process.

Firstly, it’s important to identify the goal we want to achieve by setting CPU limits. If it’s for governance purposes, such as controlling the amount of share a paying customer can get, then CPU limits need to be in place.

However, for most of the Kubernetes cluster, the goal is to achieve maximum resource utilization while ensuring that the cluster is stable. In these cases, the most sensible thing to do is to set the CPU requests to the pods and leave out the CPU limit. By setting the CPU requests, we can guarantee that the pods have at least fair access to the CPU resource. Then, the pods are free to use the remaining CPU as they please in the time of need, since there isn’t a limit specified.

To drive the point home, let’s consider a scenario where we have a Kubernetes cluster with four cores. Then, for the sake of cluster stability, we deploy two workloads, each requesting one core, with a utilization limit of two cores. This looks sensible at first glance, as each pod can, at most, use two cores that add up to the total capacity of the cluster.

Later, the second workload sees a sudden spike in traffic and requires more execution time from the cluster. Although the first workload doesn’t utilize the additional core the cluster can offer, the second workload can’t leverage the free and idle core to speed up its processing to deal with the spike. Ultimately, this leads to a waste of resources.

On the contrary, when we don’t specify the CPU limit on the workloads, each workload is still guaranteed a single core of processing at any given time. In the event that both workloads see a sudden spike, neither of them can take away that one-core processing capability from the other. Hence, the stability of the cluster is guaranteed.

5. Conclusion

In this article, we’ve first learned that cores refers to the number of physical processing units in a CPU. Then, we’ve seen that in the context of Kubernetes, it represents the unit of measurement for the fundamental computing unit of a cluster. Afterward, we learned that we can define CPU requests and limits via the requests and limits keyword in the pod specification. Finally, we’ve explored the potential issues of setting CPU limits in a Kubernetes cluster.