EC2 auto-scaling based on workload-consistent requests

0

Hello ! I want to create an app that require a lot of computing power (an API who makes images with stable diffusion). So I’ll use EC2 instances to do the calculations. The entry point of my back-end will be an Amazon API Gateway, who’s only gonna handle a few requests only (like, 3), each with a very consistent (and known) workload. The number of user requests could greatly vary in a (relatively) short period of time (up and down).

What’s the best (and cost-effective) way to scale this workload ? I tried to look at "load balancer", but I didn’t found a good way to use it for this purpose. I was thinking about creating a SQS queue to store requests, and scale up my EC2 instances when too much requests stack up. It that a good idea ? If so, what’s the best way to do it ?

I’m all ears ! Thanks in advance.

1 Resposta
2
Resposta aceita

Yes SQS is often used in front of a "worker tier" like this, with instances in an EC2 Autoscaling Group that has scaling policies driven by a queue depth metric, or possibly application-specific custom metrics you generate from the worker nodes if that could provide better information for scaling. API Gateway can interface with SQS.

ESPECIALISTA
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas