Health check failing in target group for AWS Fargate task with Springboot API in container

0

Hi All, I am facing health check failed error in ALB target group when spinning AWS fargate task in ECS which is running container image with Spring boot hello API on 9099.

Below are the details of steps, issues and how I am trying to debug.

1.) Built the docker image locally with sample hello API. deployed it on docker runtime running on my laptop. Tested with GET localhost:9099/hello in browser and postman. I am getting 200 OK response with body. Pushed this to AWS ECR.

2.) Created ECS task definition for image with

  • 4 vCPU, 10GB Memory.
  • Linux/x86/64 OS, awsvpc networking node, container port -9099

3.) Used above task definition to create task and run service with new ALB, new listener listening on 80 for request and new target group connecting to ECS task on 9099, below config - Health check grace period - 60 seconds. Fargate platform - tried with 1.3.0, 1.4.0 and latest version also( but issue remains same)

ECS task's security group allowed - ALL TCP, TCP and 0-65535 from ALB security group and also added Custom TCP 9099 from source 0.0.0.0/0 and ::/0 so that I can test if container is responding from public and private IP.

ALB and target group and listeners are created along with task.

4.) updated desired task to 1 so that it spines one task for container. instance get activated , comes in running state and able to access Spring boot API from browser and postman with public IP address also tested using private IPs of container instance from new EC2 instance and able to get success response. container logs showing entry. CPU and memory usages are good , no error in container instance.

5.) Issue starts here when I checked health check in ALB in target group, it says, Request time out and status is unhealthy. I realised, I didn't update anything in health check in target group which was running with default settings.

6.) Now Updated health check with below -

Health check protocol - HTTP

Health check path - /hello ( as this URL is responding with 200 Ok with private IP of container instance which is getting registered under target group.)

Port - traffic port ( so 9099 should be used for health check also)

Healthy threshold -2

Unhealthy threshold -3

Timeout -120 sec ( health check will have enough time and container instance will be up )

Interval - 150 sec

success code -200-499

7.) Now when I updated ECS task with 0 and then back to 1, this time target group health check status is showing unhealthy and description is "Health check failed" that means health check is triggered but failed. I checked containers logs, there is no hit for /hello on 9099.( per my understanding , there should be in logs in container) meanwhile target group will keep registering and unregistering ECS container instance after 2 time and finally marking it unhealthy

8.) when I hit public DNS of ALB with 9099/hello in browser , it return 504 error. checked monitoring under target group - request reaches till target group but due to unhealthy target, it is not going to ECS. That means ALB security group not having any issue.

9.) Cloudwatch logs for ECS just showing spring boot API is running fine. No error reported.

10.) Enabled access logs of ALB to check if any details are getting captured about health check and what is the response code of health check, but no logs related to health check. Where to check health check detailed logs?

11.) ALB & ECS task is in single region us-east-1 , have added all 6 AZ in ALB and all 6 subnets (AZs) of VPC in ECS task also.

12.) Checked multiple post on stackoverflow and aws documentation from back 2020 and 2021 and other websites about this issue. Nothing helped me so far.

13.) My thought is if health check timeout is 120 seconds then target group will wait for 120 seconds for response from ECS. ECS health check grace period is 60 seconds which is good enough for ECS container to be up and running and ignore any health check till initial 60 seconds still health check has 60 more seconds and also after 150 seconds there will be second heath check call (Unhealthy threshold 3) so it health check should not fail.

  1. To add one more point - I had tried adding 80 http port in ECS container instance security group for health check and had modified "Port" under target group health check to use " override" and used port 80. still no entry in ECS container task. It means there is something which is stopping request to reach to ECS from ALB target group.

Please let me know if I am missing anything in these steps. Appreciate any pointers. Thanks

I had done same with EC2 task type 2-3 weeks before and it worked without any change in health check param. Details @ https://medium.com/@rajesh.ranjan66/deploying-container-on-amazon-ecs-using-ec2-from-aws-console-e352563a2bf9?source=friends_link&sk=4345c10def0651dc9ccb9998ca939d93

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions