Weird and uneven Application Load Balancer traffic routing

0

Hello, I implemented a step function to send requests to an InDesign server operating on a Windows Server EC2. This server renders documents and uploads them to S3 via the Storage Gateway.

Initially, I had one EC2 instance and its performance met my expectations. To handle more requests, I spun up a second EC2 instance using an AMI of the first EC2. I am using an application load balancer to route the traffic towards these servers (via target groups). However, the distribution of requests across the servers is irregular and random.

I anticipated the traffic to alternate systematically between server 1 and server 2. But instead, the pattern was inconsistent, something like server 1, server 2, followed by three requests to server 1, several evenelly distributed requests, then two to server 2, etc. This method overloaded one server momentarily while leaving the other idle, which extended the processing times by 2 to 3 times, thus reducing the number of requests sent as compared to a single EC2 performance. Also - CPU was very even between the servers when most requests were evenlly distributed, so routing to an EC2 with less load does not make sense.

Both EC2 instances are in the same availability zone and have an identical configuration and instance type. I have not enabled sticky sessions and the Application Load Balancer retrieves requests from HTTPS and redirects them via a custom HTTP port to the EC2 servers.

I am not sure why ALB routes the traffic at random without a pattern or a visible indicator (at least to me), but maybe someone here has an idea.

Thanks, Mario

1 Risposta
0

The behavior isn't expected if the default routing algorithm for a target group, round robin, is selected. The routing appears to be 'least outstanding requests' or 'weighted random'. Can you confirm the target group's routing algorithm?

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#modify-routing-algorithm

AWS
con risposta 3 mesi fa
  • Hello smaw, i can confirm that the algorithm is the default round robin, as i have not changed it. In the mean time, Ive tried implementing my own solutions - own queue with a database an counters for each server and a solution with sqs and message polling on each server. Both are currently more reliable than the round robin from what i have seen in the load balancer logs

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande