Is there any way to limit active requests per ELB target?

0

I have a small deployment where 2 instances is usually enough to handle the load. However, requests are very resource intensive (CPU and RAM), with individual requests taking up to 30 seconds to complete while pegging the CPU. Occasionally I see spikes in requests that result in the user waiting way too long for the system to catch up. I want to utilize auto scaling to add instances in these situations. The problem is by the time new instances have spun up, there are already too many requests in flight, and no way for the original instances to share the work backlog.

Is there a way with the Application Load balancer to limit the number of active requests to each target, effectively moving the work queue from the instances to the ELB? That way when new instances come up they can immediately be given a bunch of work to do, and the original instances will never be overloaded.

  • Are you able to change the architecture around to have SQS (or some other queue) in between? Otherwise, load shedding by having the server sent a quick reply to the client asking them to retry might be your best bet

  • That's a good suggestion. A queue might be tricky since it would involved significant changes to our upstream architecture. That seems like the obvious approach if you have requests that are so slow that they clearly need to be async, such as video encoding, but do you have any links on typical ways to architect that when handling traditional synchronous HTTP requests?

  • EDIT: This turned into multiple long comments, just going to add an answer to the question

2 Answers
1

Unfortunately, there is no direct way at the ALB level to put a hard restriction on the number of connections a target will receive.

As you mentioned, Auto Scaling is an option, use Auto Scaling Group to dynamically adjust the number of instances based on traffic. If you haven't already explored Target Tracking(dynamic) and Predictive scaling policies, I'd suggest you to see which fits bets in your use case. I've been in same situation in past and I chose predictive scaling for my workloads as I could predict some of the traffic pattern.

More details, refer this AWS Documentation: Dynamic Scaling and AWS Documentation: Predictive Scaling.

Comment here if you have additional questions, happy to help.

Abhishek

profile pictureAWS
EXPERT
answered 11 days ago
profile picture
EXPERT
GK
reviewed 11 days ago
  • The main problem with auto scaling is that by the time new instances are up, all the requests have already been handed to upstream servers, ie there's no way for them to offload the work they've been given. The load balancer seems like the correct place to handle a work queue.

0

ELB doesn't directly support queuing like you're looking for. You'd need to implement your own custom load balancer layer using something like Nginx or HAProxy to do that (and even with those, it might be complicated to setup)

I think a better option might be to setup a queue based system where you're able to make things a bit more async.

There's a bunch of StackOverflow discussions on it, but I've never implemented it, so not going to link to a specific one since I can't vouch for them. In general, the way you set this up would depend on how much control you have over the client.

  1. If the client is just a web browser or something else you can't control, you'd send a reply back to the client synchronously saying "we received your request, check back here<link> for results" - and let the end user check back on their own.
  2. If you control the client, you could synchronously pass back a token, and have the client know to automatically poll back periodically using that token

For the ELB/ASG side of things, you could also then split into a 2 tier setup, where there's a small frontend taking in these requests and responding to the followup polling requests from clients, but isn't actually processing the requests. The frontend just puts the request info in an SQS (or other) queue. The backend workers then pull jobs from that queue and you autoscale on messages visible in the queue, or on AverageMessagesPerWorker (Note: You can do something similar to the approach in this doc with metric math on a single scaling policy without needing to go through all the steps of publishing a custom metric)

AWS
answered 4 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions