EC2 auto-scaling based on workload-consistent requests

0

Hello ! I want to create an app that require a lot of computing power (an API who makes images with stable diffusion). So I’ll use EC2 instances to do the calculations. The entry point of my back-end will be an Amazon API Gateway, who’s only gonna handle a few requests only (like, 3), each with a very consistent (and known) workload. The number of user requests could greatly vary in a (relatively) short period of time (up and down).

What’s the best (and cost-effective) way to scale this workload ? I tried to look at "load balancer", but I didn’t found a good way to use it for this purpose. I was thinking about creating a SQS queue to store requests, and scale up my EC2 instances when too much requests stack up. It that a good idea ? If so, what’s the best way to do it ?

I’m all ears ! Thanks in advance.

1 個回答
2
已接受的答案

Yes SQS is often used in front of a "worker tier" like this, with instances in an EC2 Autoscaling Group that has scaling policies driven by a queue depth metric, or possibly application-specific custom metrics you generate from the worker nodes if that could provide better information for scaling. API Gateway can interface with SQS.

專家
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南