ECS Docker Web Application on ec2 instances are interrupted

0

Dear AWS Experts, I'm faced to the following problem and documentation couldn't help me so far. Basically the setup works with great response but I have many interruptions as you can see in the first screenshot. The Health Checks fail very often which means I have response time outs of more than 5 seconds. All my investigations regarding CPU and memory issues (instance screenshot, system logs and ssh) on the instances result in no findings. The problem occurs on both instances and when there is no traffic (during night). The second instance (t2.medium) was added in order to exclude some strange behavour of first instance (t2.small).

The setup is based on ECS and the docker images run on ec2 instances which are feeded by an application load balancer as second screenshot displays.

failed Health Check setup

Thanks for help and kind regards, Philipp

asked a month ago173 views
2 Answers
1

Hello, I need a bit more information:

  1. Are there scheduled maintenance tasks, automated backups, or scaling operations configured to run during low-traffic hours? Also, could you share details of your Auto Scaling Group settings, particularly the grace period length and any specific scaling policies planned for those times?

  2. Could you provide the system and application logs from your EC2 instances that correspond with the timestamps of the health check failures?

profile picture
EXPERT
answered a month ago
1

This is not a nice spot to be. Just because it's intermitting... Anyway you could try to verify these steps:

  • Verify the health check port and path configured for the target group matches what the application in the ECS container is expecting.
  • Monitor CPU and memory metrics for the ECS service - high utilization can cause response timeouts.
  • Consider increasing the health check grace period to allow more time for tasks to initialize before checks start.
  • Check application logs for any errors that may be causing failures.
  • Confirm the application returns a 200 response code to health checks as expected. For ALB, you can customize the expected response code if needed.
  • Review security groups and ACLs to ensure health check source IPs from Route 53 are allowed to the endpoints.

The failures may possibly be due to temporary issues with the EC2 instances themselves. Review system logs and metrics for any clues.

profile picture
EXPERT
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions