EC2 auto-scaling based on workload-consistent requests

Question

Hello !
I want to create an app that require a lot of computing power (an API who makes images with stable diffusion). So I’ll use EC2 instances to do the calculations. The entry point of my back-end will be an Amazon API Gateway, who’s only gonna handle a few requests only (like, 3), each with a very consistent (and known) workload. The number of user requests could greatly vary in a (relatively) short period of time (up and down).

What’s the best (and cost-effective) way to scale this workload ? I tried to look at "load balancer", but I didn’t found a good way to use it for this purpose. I was thinking about creating a SQS queue to store requests, and scale up my EC2 instances when too much requests stack up. It that a good idea ? If so, what’s the best way to do it ?

I’m all ears ! Thanks in advance.

Accepted Answer

Yes SQS is often used in front of a "worker tier" like this, with instances in an EC2 Autoscaling Group that has scaling policies driven by a queue depth metric, or possibly application-specific custom metrics you generate from the worker nodes if that could provide better information for scaling.  API Gateway can interface with SQS.

EC2 auto-scaling based on workload-consistent requests

Relevanter Inhalt