Hello !
I want to create an app that require a lot of computing power (an API who makes images with stable diffusion). So I’ll use EC2 instances to do the calculations. The entry point of my back-end will be an Amazon API Gateway, who’s only gonna handle a few requests only (like, 3), each with a very consistent (and known) workload. The number of user requests could greatly vary in a (relatively) short period of time (up and down).
What’s the best (and cost-effective) way to scale this workload ? I tried to look at "load balancer", but I didn’t found a good way to use it for this purpose. I was thinking about creating a SQS queue to store requests, and scale up my EC2 instances when too much requests stack up. It that a good idea ? If so, what’s the best way to do it ?
I’m all ears ! Thanks in advance.
As a side note, you'll probably want to look into Attribute Based Selection for giving the ASG a list of multiple instance types to try and launch to have better chances at having capacity available for very large bursts, especially if you're using spot: https://docs.aws.amazon.com/autoscaling/ec2/userguide/create-asg-instance-type-requirements.html