auto scale EKS nodes

Question

What's the easiest way to auto EKS nodes scaling up/down, without doing it manually from the console? Thought I could just adjust the ASG, but I guess the Node group scaling configuration will take precedence over that?

I'm thinking I can use a powershell script with the update-nodegroup-config command? Anyone try that before?

Accepted Answer

Found a solution that works for us. Systems Manager has an automation document called "AWS-UpdateEKSManagedNodeGroup" You can run this and modify the NodeGroupDesiredSize field and scale up/down during a maintenance window. No need to use Karpenter or CA.

Answer

In the AWS documentation they tell you two ways to work with node scaling in your EKS cluster:
1. Karpenter 
2. Cluster Autoscaler

https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html

I would recommend working with karpenter, node scaling works much better

Answer

Hello,

The problem with using only auto-scaling groups to scale nodes up and down for EKS clusters is that the autoscaling group is not Kubernetes aware. For example - You could write a rule that if the underlying EKS nodes hit 60% of cpu capacity, increase the desired count of the autoscaling by 1, but the problem could arise if say you have 3 nodes each with 4vcpu and at 50% cpu utilisation(2 vpcu available out of 4) and a Kubernetes deployment spin up a pod with the request of 3 vcpu, then that pod will always remain in the pending state, because kubernetes scheduler won't have any nodes with 3 vcpu available. Hence you need a solution that is Kubernetes aware and works on scaling your nodes up and down by looking at the "pending" pods in your cluster and then scaling your cluster up. Similar issues will arise when scaling down the cluster, for example - Since autoscaling group is not Kubernetes aware, it won't respect say, the [Pod Disruption Budgets](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) for example, and hence your pods availability will be negatively affected for critical applications.

Currently the 2 solution to provide you with compute capacity based on pending pods are:

1. Karpenter
2. Cluster Autoscaler (CAS)

CAS works by creating nodegroups, an abstract Kubernetes concept, and are backed by AWS autoscaling groups. To scale up, they look at pending pods and increase the desired count of the autoscaling group to provide capacity to the Kubernetes scheduler to place the pods in.

Karpenter, works directly with the AWS EC2 fleet to spin up nodes based on pending pods and is more flexible and faster as compared to CAS. With the [0.29.0 release of Karpenter](https://github.com/aws/karpenter/releases/tag/v0.29.0) it now supports Windows workload as well.

Please also have a look at the best practices when using CAS here - https://aws.github.io/aws-eks-best-practices/cluster-autoscaling/

and the best practices when using Karpenter - https://aws.github.io/aws-eks-best-practices/karpenter/

PS: Also, if you are using only Fargate with Kubernetes, you don't need to think about autoscaling your cluster, as Fargate pods will be provided the underlying capacity (microVM) by Fargate. If you are using Fargate, you need to think about autoscaling your pods based on say [Horizontal Pod autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) and when your pods scale-out Fargate will spin up the capacity automatically.

Please let me know in case of any queries.

Thanks,
Manish

auto scale EKS nodes

관련 콘텐츠