2 Answers
- Newest
- Most votes
- Most comments
1
Hello, I need a bit more information:
-
Are there scheduled maintenance tasks, automated backups, or scaling operations configured to run during low-traffic hours? Also, could you share details of your Auto Scaling Group settings, particularly the grace period length and any specific scaling policies planned for those times?
-
Could you provide the system and application logs from your EC2 instances that correspond with the timestamps of the health check failures?
1
This is not a nice spot to be. Just because it's intermitting... Anyway you could try to verify these steps:
- Verify the health check port and path configured for the target group matches what the application in the ECS container is expecting.
- Monitor CPU and memory metrics for the ECS service - high utilization can cause response timeouts.
- Consider increasing the health check grace period to allow more time for tasks to initialize before checks start.
- Check application logs for any errors that may be causing failures.
- Confirm the application returns a 200 response code to health checks as expected. For ALB, you can customize the expected response code if needed.
- Review security groups and ACLs to ensure health check source IPs from Route 53 are allowed to the endpoints.
The failures may possibly be due to temporary issues with the EC2 instances themselves. Review system logs and metrics for any clues.
Relevant content
- asked 6 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago