ECS EC2 containerInstance always unhealthy

0

I changed my deployment from a successful ECS Fargate one to an ECS EC2 task/service

I created an autoscaling group. This is up and running and the container (app) is deployed there. Logs are looking good. But whatever I do, I get an unhealthy state. What I do wrong?

My Infrastructure looks like this:

        const cluster = new Cluster(this, 'B2BCluster', { vpc: this.mainVpc });
	const securityGroup = new SecurityGroup(this, `${this.instanceName}-MyMainApplicationSG`, {
			vpc: this.mainVpc,
		});
		securityGroup.connections.allowInternally(Port.allTraffic(), 'Allow internal traffic');

		const ec2AutoScalingGroup = new AutoScalingGroup(this, 'todo-app-asg', {
			vpc: this.mainVpc,
			instanceType: new InstanceType('m6a.2xlarge'), 
			machineImage: EcsOptimizedImage.amazonLinux2(AmiHardwareType.STANDARD),
			desiredCapacity: 1,
			minCapacity: 1,
			maxCapacity: 2,
		  });		
		  ec2AutoScalingGroup.protectNewInstancesFromScaleIn();
		  ec2AutoScalingGroup.connections.addSecurityGroup(securityGroup);		  

		  const capacityProvider = new AsgCapacityProvider(this, 'todo-app-capacity-provider', {
			autoScalingGroup: ec2AutoScalingGroup,
			enableManagedTerminationProtection: true,
			enableManagedScaling: true,			
		  });
	  		  
		  cluster.addAsgCapacityProvider(capacityProvider);

		this.createIngressRule({
			ingressName: 'MyAppRules',
			priority: 1,
			pathPattern: '/',
			exposeContainer: MyContainers.MyApp.getMyAppContainer(),
			HealthCheckPath: '/healthz',
		});

	private createIngressRule(ingress: IngressRule): void {
		const targetGroup = new ApplicationTargetGroup(this, `${this.instanceName}-${ingress.ingressName}-TargetGroup`, {
			vpc: this.mainVpc,
			port: ingress.exposeContainer.getTaskDefinition().defaultContainer!.containerPort,
			protocol: ApplicationProtocol.HTTP,
			targets: [
				ingress.exposeContainer.getService().loadBalancerTarget({
					containerName: ingress.exposeContainer.getTaskDefinition().defaultContainer!.containerName,
					containerPort: ingress.exposeContainer.getTaskDefinition().defaultContainer!.containerPort,
				}),
			],
			healthCheck: {
				enabled: true,
				path: ingress.HealthCheckPath,
				port: ingress.exposeContainer.getTaskDefinition().defaultContainer!.containerPort.toString(),
				protocol: Protocol.HTTP,
				interval: Duration.seconds(15),
				timeout: Duration.seconds(10),
				unhealthyThresholdCount: 10,
				healthyThresholdCount: 10,
			},		
		});
asked 11 days ago13 views
1 Answer
0

Based on the information provided, there are a few potential reasons why your ECS EC2 container instances are consistently showing as unhealthy:

  1. Health Check Configuration: Your health check configuration for the target group seems to have very high thresholds. You've set both unhealthyThresholdCount and healthyThresholdCount to 10, which means it takes a long time for instances to be marked as healthy or unhealthy. Consider lowering these values to more standard numbers (e.g., 2-3) for faster health state changes.

  2. Security Group Configuration: Ensure that your security group (securityGroup) allows inbound traffic on the port your application is listening on, not just internal traffic. The current configuration only allows internal traffic, which might prevent the load balancer from reaching your application for health checks.

  3. Container Health Check: Make sure your container has a health check endpoint ('/healthz' in your case) that responds correctly. The application inside the container should be configured to respond to this health check.

  4. ECS Agent: Verify that the ECS agent is running correctly on your EC2 instances. The ECS agent is responsible for communicating the container instance's status to ECS.

  5. IAM Roles: Ensure that your EC2 instances have the correct IAM roles attached to communicate with ECS and other necessary AWS services.

  6. Capacity Provider Strategy: You've created a capacity provider, but it's not clear from the provided code if you've associated it with your service. Make sure you're using the capacity provider in your service definition.

  7. Container Resource Allocation: Ensure that your task definition allocates appropriate CPU and memory resources for your containers, and that these resources are available on your EC2 instances.

To troubleshoot:

  • Check the ECS console for specific error messages related to your tasks or services.
  • Review the EC2 instance logs for any issues with the ECS agent or container runtime.
  • Verify that your application is actually running and listening on the expected port inside the container.
  • Test the health check endpoint directly on the EC2 instance to ensure it's responding as expected.

If none of these solve the issue, you may need to dive deeper into your application logs and ECS task logs to identify any application-specific problems that could be causing the health checks to fail.
Sources
Auto Scaling groups - Amazon EC2 Auto Scaling
A deep dive into Amazon ECS task health and task replacement | Containers

profile picture
answered 11 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions