EC2 auto-scaling based on workload-consistent requests

0

Hello ! I want to create an app that require a lot of computing power (an API who makes images with stable diffusion). So I’ll use EC2 instances to do the calculations. The entry point of my back-end will be an Amazon API Gateway, who’s only gonna handle a few requests only (like, 3), each with a very consistent (and known) workload. The number of user requests could greatly vary in a (relatively) short period of time (up and down).

What’s the best (and cost-effective) way to scale this workload ? I tried to look at "load balancer", but I didn’t found a good way to use it for this purpose. I was thinking about creating a SQS queue to store requests, and scale up my EC2 instances when too much requests stack up. It that a good idea ? If so, what’s the best way to do it ?

I’m all ears ! Thanks in advance.

1개 답변
2
수락된 답변

Yes SQS is often used in front of a "worker tier" like this, with instances in an EC2 Autoscaling Group that has scaling policies driven by a queue depth metric, or possibly application-specific custom metrics you generate from the worker nodes if that could provide better information for scaling. API Gateway can interface with SQS.

전문가
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠