How to configure autoscaling to handle load spikes?


I have about 10 different CPU-intensive microservices with configured HPA and autoscaler, deployed on EKS. How can I configure the cluster to smoothly scale nodes during load spikes? Should i use predictive scaling or it could be done with HPA only?

asked a year ago376 views
2 Answers

You can use predictive auto-scaling, but because the algorithm is data-driven, and you would need to review if PHPA is the best fit. The PHPA won't kick in until after 24 hours of operation and after that for a short period won't be any different that HPA.

Something else that you can do in the mean time as well is use Locus to do testing with HPA to see if you get the desired results. Here is a blog that goes over that implementation.

profile pictureAWS
answered a year ago

Hello Evgeny,

When we talk about autoscaling in EKS, we have to consider 2 things:

  1. Pod autoscaling - Could be achieved by Horizontal Pod Autoscaling or Vertical Pod Autoscaling for example. You are already using HPA, so all good here.
  2. Cluster autoscaling - You would need to think about providing the underlying compute to the Kubernetes scheduler whenever there are pending pods and this can be done by Cluster Autoscaler(CAS) or Karpenter. From your question, it seems you are already using CAS, so far so good.

Now if you are using CAS for a relatively big cluster, you could see during load spikes that when HPA spins up one more pod, which goes into pending state due to resource unavailability, that it could take upto 5 minutes for that pod to be placed into the node. To solve this problem you have potentially 2 solutions:

  1. You could over-provision your cluster by a bit by placing some "pause" pods with negative priority so that when your workloads get into pending state, scheduler kicks out the pause pods and place your pods into an already running node and CAS then spin up the required nodes for the pending pause pods which could takes up minutes but your application availability does not get affected. This has been described here in the EKS best practice guide in detail with the necessary recommendations around it.
  2. You could also test out Karpenter which is faster than CAS and in most cases places your pending pods in a new node around 30-40 secs. You could test it out in your dev environment by spacing the replicas for your deployment.

Please let me know in case of any queries.

PS: The problem with using auto-scaling groups directly to scale nodes up and down for EKS clusters is that the autoscaling group is not Kubernetes aware and hence say, while scaling down the nodes, it won't respect say, the Pod Disruption Budgets for example, and hence your pods availability will be negatively affected for critical applications.

Thanks, Manish

profile picture
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions