EC2 auto-scaling based on workload-consistent requests

0

Hello ! I want to create an app that require a lot of computing power (an API who makes images with stable diffusion). So I’ll use EC2 instances to do the calculations. The entry point of my back-end will be an Amazon API Gateway, who’s only gonna handle a few requests only (like, 3), each with a very consistent (and known) workload. The number of user requests could greatly vary in a (relatively) short period of time (up and down).

What’s the best (and cost-effective) way to scale this workload ? I tried to look at "load balancer", but I didn’t found a good way to use it for this purpose. I was thinking about creating a SQS queue to store requests, and scale up my EC2 instances when too much requests stack up. It that a good idea ? If so, what’s the best way to do it ?

I’m all ears ! Thanks in advance.

1 Antwort
2
Akzeptierte Antwort

Yes SQS is often used in front of a "worker tier" like this, with instances in an EC2 Autoscaling Group that has scaling policies driven by a queue depth metric, or possibly application-specific custom metrics you generate from the worker nodes if that could provide better information for scaling. API Gateway can interface with SQS.

EXPERTE
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen