How do I troubleshoot latency on calls or requests to Amazon ECS tasks?

4 minute read
0

I want to troubleshoot my Amazon Elastic Container Service (Amazon ECS) Application that's slow to respond to requests.

Short description

The following are common causes of high latency on an ECS Tasks:

  • High CPU or memory (RAM) utilization on tasks.
  • Issues with application dependencies that run inside the application.
  • Large network distance between clients or on-premises targets and ECS Task.
  • Network connectivity issues, overages.
  • Amazon Elastic Block Store (Amazon EBS) volume throttling.

To investigate and resolve these issues, first try to isolate where the delay occurs, then complete the resolution steps.

Resolution

To troubleshoot high latency on your ECS task, complete the following steps:

  1. Run the following command to measure the first byte response and check for slow DNS resolution that might cause latency:

    % curl -kso /dev/null -w "\n===============
    | DNS lookup: %{time_namelookup}
    | Connect: %{time_connect}| App connect: %{time_appconnect}
    | Pre-transfer: %{time_pretransfer}
    | Start transfer: %{time_starttransfer}
    | Total: %{time_total}
    | HTTP Code: %{http_code}\n===============\n" https://LOAD_BALANCER_DNS_NAME.com
    
    Example output:
    | DNS lookup: 0.035596
    | Connect: 0.063130
    | App connect: 0.159145
    | Pre-transfer: 0.159264
    | Start transfer: 0.190203
    | Total: 0.190722
    | HTTP Code: 200

    Note: The preceding example output is in ms timeframes. It's a best practice to run the initial tests from inside the VPC to reduce the variables involved in different networking pathways.

  2. Next, bypass the load balancer. Use a Task IP address for a known running task to direct the preceding curl. This process helps isolate the component that causes latency.

  3. If there's an Application Load Balancer, then check the average statistic of the Amazon CloudWatch TargetResponseTime metric for excessive values.

    If the value is high, then you have an issue on the tasks, or possibly an application dependency on external connections. For more information, see How do I troubleshoot an increase in the TargetResponseTime metric for an Application Load Balancer?

    If there are a large number of tasks, then activate and review the access log entries of your Application Load Balancer to identify the backend instances.

  4. To confirm issues with the Application Load Balancer, review the request_processing_time and response_processing_time fields in the log entries for unusually high time periods. For more information, see How do I troubleshoot high latency on my Application Load Balancer in Elastic Load Balancing? If you curl the Task IP address directly and receive a slow response, then review the CloudWatch Container Insights.

  5. If CPU and memory utilizations are under 90% on average with no spikes, then check for dependencies on your application tasks that might cause latency. Dependencies include calls to external resources, such as Amazon Simple Storage Service (Amazon S3) buckets, Amazon Relational Database Service (Amazon RDS) databases, or other remote web services.

  6. If external calls are part of the application's expected workflow, then confirm with the application's developers if it's making synchronous calls to external dependencies. Or, lock the application until it receives responses to these calls. For more information, see Managing Asynchronous Calls.

  7. If you host on EC2 container instances, then check any Amazon EBS volumes and network interfaces for signs of over-utilization. For more information, see How do I troubleshoot EBS volume performance issues on my EC2 instance? If signs of EBS throttling are detected, then review and increase the EBS Provisioned IOPs and throughput type. Or, use a different option, such as the instance store or Elastic Fabric Adapter (EFA).

    If you detect signs of network interface throttling, then use a larger instance type with more networking bandwidth. Or, use a network enhanced instance type that provides a larger optimized baseline. For more information, see Why does my Amazon EC2 instance exceed its network limits when average utilization is low?

  8. If you host on AWS Fargate, then check network interface metrics with an Amazon ECS network sidecar container. Note that you have to deploy with a new task definition to add a sidecar container.

AWS OFFICIAL
AWS OFFICIALUpdated 12 days ago