How do I troubleshoot performance issues in my Amazon ECS hosted applications?

4 minute read
0

My applications that are hosted on Amazon Elastic Container Service (Amazon ECS) experience slow response times.

Resolution

Identify reasons for slow response times

Review Amazon ECS service events to determine the reasons for stopped tasks.

Resource constraints

You might receive one of the following error messages in the application logs:

  • CPU issues: Thread starvation, clock leap, or execution timeouts
  • Memory issues: OutOfMemoryError or GC overhead limit exceeded

When the application experiences an issue with resources, the container might stop. Operations that are in progress are discarded before they complete. Data is lost, and responses to requests fail.

To resolve issues with resource constraints, complete the following steps:

  1. Open the Amazon ECS console.
  2. In the navigation pane, choose Task definitions.
  3. Choose Create a new task definition.
  4. In the Container details section, define the resources for your workload.
    Note: For more information, see What do I need to know about CPU allocation in Amazon ECS?
  5. Choose the Amazon Elastic Compute Cloud (Amazon EC2) container instance types to allocate resources.
  6. Optimize your task launch times.

For more information about CPU and memory utilization, see Service level CPU and memory utilization.

I/O and networking issues

You might receive one of the following error messages because of connection errors or timeouts:

  • Connection timeouts and retries: SocketTimeoutException: connect timed out
  • DNS resolution failures: Could not resolve host: api.example.com
  • Network unreachable errors: Network is unreachable (Host : 'database.example.com'. Port: 5432)
  • Connection refused errors: java.net.ConnectException: Connection refused (Connection refused)
  • SSL/TLS errors: SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure

To troubleshoot your I/O and networking issues, take the following actions:

  • Check your DNS settings.
  • Verify your security group rules.
  • Confirm that you correctly configured SSL/TLS.

Database Errors

For database issues, you might receive one of the following error messages in your application logs:

  • Connection pool exhaustion: java.sql.SQLException: Too many connections
  • Query timeouts: ERROR: canceling statement due to statement timeout
  • Deadlocks: Deadlock found when trying to get lock; try restarting transaction
  • Connection issues: Cannot connect to MySQL server on '*' (111)

To resolve database issues, optimize database queries, and then rebuild your index.

Monitor load balancer metrics and health checks

Use Amazon CloudWatch to monitor metrics about load balancers and targets.

Review application responses

You might receive one of the following 5xx status codes:

  • HTTP 500 Internal Server Error
  • HTTP 502 Bad Gateway
  • HTTP 504 Gateway Timeout

To resolve these issues, see Common errors.

Review application logs for error messages

Use CloudWatch to review your Amazon ECS logs for errors such as, Connection timeout, Database query exceeded time limit, or Memory limit exceeded. To identify errors, and analyze bottlenecks and response times, use application trace data.

Optimize performance

Use the following methods to improve the performance of your Amazon ECS hosted applications.

Cache your data

Use a caching system to quickly transfer your data. Examples of caching systems that you can use include Redis, Memcached, or Amazon CloudFront.

If you use the EC2 launch type, then configure ECS_IMAGE_PULL_BEHAVIOR: prefer-cached as the Amazon ECS container agent pull behavior. For more information, see Optimize Amazon ECS task launch time.

Use Application Auto Scaling

Use Application Auto Scaling to modify task counts based on demand. Then, configure scaling policies.

For more information, see What is Amazon EC2 Auto Scaling?

Optimize container images

Use multi-stage builds to reduce image size, and cache your image layers. For more information, see Linux containers on Fargate container image pull behavior for Amazon ECS.

Implement efficient logs and monitors

Use the LogConfiguration API to set log levels, and then use AWS X-Ray to implement distributed tracing.

Related information

Amazon ECS service event messages

CloudWatch metrics for your Application Load Balancer

Centralized container logs with Amazon ECS and Amazon CloudWatch Logs

How can I configure Amazon ECS service auto scaling on Fargate?

AWS OFFICIAL
AWS OFFICIALUpdated a month ago