As Amazon Elastic Kubernetes Service (EKS) becomes a popular choice for running Kubernetes in the cloud, organizations often overlook an important aspect of cost optimization: node utilization. While EKS offers ease of scaling and management, inefficient node usage can lead to significant cost overruns.
For many EKS clusters, wasted resources result from over provisioned pods, idle nodes, and poor scheduling, leading to unnecessary charges. By optimizing node utilization, organizations can cut costs by up to 40% while ensuring high availability and scalability for their applications.
In this blog, we will explore best practices to optimize node usage in EKS clusters and reduce cloud costs.
Node utilization in EKS refers to how efficiently the worker nodes in your cluster are used to handle workloads. In Amazon EKS, nodes are EC2 instances that run the Kubernetes workloads (pods). Poor node utilization occurs when:
Effective node utilization ensures that your applications are running on the minimal number of nodes needed to meet their resource requirements without overprovisioning.
Selecting the appropriate EC2 instance types for your EKS nodes is crucial for achieving cost efficiency. Amazon EKS offers the flexibility to use a variety of EC2 instances for your worker nodes, but improper selection can lead to either underutilized resources or performance bottlenecks. To optimize costs without compromising performance, it is important to analyze resource usage by monitoring CPU and memory utilization using Amazon CloudWatch and EKS metrics. Based on these insights, you should select instance types that align with your workload requirements.
For example, smaller workloads can benefit from cost-effective instances such as t3.micro or t3.medium, while memory-intensive applications may require instances like r5.xlarge to ensure optimal performance.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-cluster
region: us-west-2
nodeGroups:
- name: small-nodegroup
instanceType: t3.medium
desiredCapacity: 2
By selecting the right EC2 instance types based on workload characteristics, you can ensure that your nodes are neither over-provisioned nor underutilized.
Amazon EC2 Spot Instances allow you to utilize unused EC2 capacity at up to 90% lower cost than On-Demand Instances. They are ideal for fault-tolerant, stateless, or flexible workloads that can withstand interruptions. Leveraging Spot Instances can significantly reduce compute costs in your EKS cluster.
Before using Spot Instances effectively, ensure your EKS cluster can dynamically scale based on workload demands by enabling and configuring the Cluster Autoscaler. The autoscaler watches for pods that can’t launch due to insufficient resources and adjusts node group sizes accordingly. Proper tuning ensures your cluster is right-sized and avoids overprovisioning, setting a strong foundation for cost-efficient Spot usage.
Important: Spot Instances can be interrupted with a two-minute warning when capacity is reclaimed by AWS. Avoid using Spot for critical workloads that require high availability or cannot tolerate disruptions.
How to Use Spot Instances in EKS:
Example of mixed node group setup:
nodeGroups:
- name: mixed-nodegroup
instanceTypes:
- t3.medium
- t3a.medium
desiredCapacity: 4
minSize: 2
maxSize: 6
capacityType: "SPOT"
spotPrice: "0.035"
- name: on-demand-nodegroup
instanceTypes:
- t3.medium
desiredCapacity: 2
minSize: 1
maxSize: 3
capacityType: "ON_DEMAND"
By blending On-Demand and Spot Instances strategically, you can drastically reduce compute costs while still ensuring that critical workloads remain stable.
Many Kubernetes workloads in Amazon EKS are overprovisioned due to high resource requests, which leads to inefficient use of resources and increased costs. Properly setting resource requests and limits ensures that workloads use only the resources they need, allowing the cluster to operate more efficiently. Begin by monitoring actual resource usage through Amazon CloudWatch or Prometheus to understand how much CPU and memory pods truly require.
With this data, set resource requests conservatively, starting from lower values and adjusting upward as needed based on real usage trends. Incorporating Horizontal Pod Autoscaler (HPA) allows the number of pod replicas to increase or decrease automatically depending on the workload, which helps in maintaining application performance while controlling resource consumption.
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
Properly setting resource requests ensures that your EKS nodes are packed efficiently, reducing waste and cost.
Topology spread constraints are Kubernetes scheduling policies that help ensure pods are evenly distributed across specified topology domains, such as availability zones or individual nodes. This even distribution improves fault tolerance and availability while also enhancing cost efficiency. By avoiding pod concentration in a single zone or node, topology spread constraints help prevent situations where some nodes are overburdened and others remain idle.
This leads to better resource utilization across the cluster and reduces the risk of underutilized infrastructure, ultimately lowering operational costs and improving resilience during zonal failures or disruptions.
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: my-app
By balancing pod distribution, you can maximize the utilization of all nodes, improving overall cluster efficiency.
Amazon EKS Managed Node Groups (MNGs) simplify the process of provisioning and managing worker nodes by automating the creation, updating, and termination of EC2 instances in your Kubernetes cluster. Instead of manually handling EC2 node setup and maintenance, MNGs allow you to define your node group configuration declaratively, and Amazon EKS takes care of the rest. This significantly reduces operational overhead and ensures that your nodes are correctly configured with appropriate IAM roles, security groups, and networking settings.
Managed Node Groups also integrate natively with the Cluster Autoscaler, allowing your cluster to automatically scale up or down based on workload demand. This ensures that resources are used efficiently and idle infrastructure is minimized, directly contributing to cost savings. Furthermore, MNGs support rolling updates, enabling you to apply security patches or Kubernetes version upgrades without downtime.
FinPeak Analytics is a fintech startup running real-time analytics and reporting platforms on Amazon EKS. Their workloads include data processing jobs, REST APIs, and batch reports. Their DevOps team observed escalating AWS bills , particularly from underutilized EC2 nodes in their Kubernetes clusters.
FinPeak initially ran an EKS cluster with 12 m5.large On-Demand nodes, each costing $0.096/hour in the US-East-1 region.
Upon reviewing CloudWatch metrics, they discovered:
The FinPeak team implemented the following:
New Setup:
By right-sizing pods, introducing Spot capacity, and using intelligent autoscaling, FinPeak Analytics saved over $600/month ,a 71.7% cost reduction without sacrificing availability or performance.
Optimizing node utilization in Amazon EKS is one of the most effective ways to reduce cloud costs and improve operational efficiency. By selecting the right EC2 instance types, leveraging Spot Instances, tuning Cluster Autoscaler, and implementing best practices like topology spread constraints, you can significantly reduce your cloud waste and cut Kubernetes costs by up to 40%.
2. Amazon EKS Managed Node Groups
3. Cluster Autoscaler on Amazon EKS
4. Using EC2 Spot Instances with Amazon EKS
5. Karpenter - Just-in-Time Nodes for Kubernetes
Strategical use of SCPs saves more cloud cost than one can imagine. Astuto does that for you!