Application load balancer timeout randomly

0

Hi there,

My application is run in ECS(base on EC2), every module is run as a container, they don't have public ip. The user requests are sent to ALB, then rewrite to my module's private container in VPC.

Our customer said that they frequently be told that the request is timeout, and just manully clean the browser cache will make the request be ok. THat's so weird...

I occur the same issue recently, I found that the network plane of the chrome show the info below: Enter image description here (The request is get stuck in the "stalled" stage...)

I directly deploy the same application on the EC2 with the same security group rule in the same VPC but don't base on ECS and ALB, everything is fine, never timeout....

Does anybody know what's going on??

Update:

This is ALB logs:

2024-02-22T02:23:10.189479Z 212.102.40.218 32894 443 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.265995Z 212.102.40.218 37174 443 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.306528Z 212.102.40.218 41608 443 TLSv1.2 - - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.421276Z 212.102.40.218 46214 443 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.506543Z 212.102.40.218 50576 443 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.544433Z 212.102.40.218 55504 443 TLSv1.1 - - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.654052Z 212.102.40.218 59618 443 TLSv1.3 TLS_AES_128_GCM_SHA256 - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.728567Z 212.102.40.218 63784 443 TLSv1.3 TLS_AES_128_GCM_SHA256 - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.765106Z 212.102.40.218 6620 443 TLSv1.3 - - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:23:10.880697Z 212.102.40.218 11072 443 TLSv1.3 TLS_AES_128_GCM_SHA256 - "-" - - Failed:UnmappedConnectionError
2024-02-22T02:24:42.110672Z 154.27.68.195 45228 80 - - - "-" - - -
2024-02-22T02:24:42.193183Z 154.27.68.195 39470 443 TLSv1.3 TLS_AES_128_GCM_SHA256 0.028 "-" - - Success

New update:

When the Get request(ajax) was stalled, I found that's because the OPTION request is failled, look : Enter image description here

In the normal case, the waterfall looks like: Enter image description here

So, this new clue verifies the ALB log above.

Now, What' wrong with Browser and ALB? Why can the issue be fixed by clearing the browser cache???

2 Answers
0

Hello.

It may be a good idea to check the ALB access log and ECS container log to see if the request is reaching the ECS container.
If the request reaches ECS, you may be able to determine whether there is a problem with application processing or container load.
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html

profile picture
EXPERT
answered 20 days ago
  • Hi, I updated the topic, you will see the ALB log.

  • From the log below, it appears that the TLS connection is not working properly. Are you setting the correct certificate in ALB? Also, the browser the user is using may be outdated and the TLS version may be outdated. So why not try changing the ALB security policy to something a little older? https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-https-listener.html#tls-security-policies

    2024-02-22T02:23:10.506543Z 212.102.40.218 50576 443 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 - "-" - - Failed:UnmappedConnectionError
    
  • Hi, I did some research for this error code, I wonder why the user just clearing the browser cache makes the request work again? If the certificate is incorrect, or the user's browser is older, I think they would always get the error... Am I missing any important knowlage?

0

I'd start investigating timeout issues by enabling ALB access logging to identify slow requests and response times. Also, review container logs for error messages and resource constraints. If containers lack resources like CPU or memory, increase allocations to improve responsiveness and reduce timeouts.

Additionally, it is always good to optimize application code by profiling for bottlenecks, implementing caching when possible, or refactoring inefficient operations.

You can combine insights from logs, resource adjustments, and code optimizations to resolve these timeout issues and also enhance application performance.

profile pictureAWS
answered 20 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions