ECS Tasks fail ELB Health Check

0

Hi,

I used the nodejs aws cdk to build an ECS service that runs a dockerized nodejs express app. When I test the docker container and code locally I am able to ping the health check just fine.

After deploying the infrastructure code, all the pieces seem to be there. However, my tasks keep deregistering because they fail the health check. The error that they show is:

Task failed ELB health checks in (target-group arn:aws:elasticloadbalancing:us-east-1:916847193903:targetgroup/AskGen-AskGe-JT3TRNKU8ROF/d048425de709efce)

I can see in the task logs that the requests are being received and are processed correctly, but the tasks continue to fail the health checks.

Any help/insights would be greatly appreciated.

Here is the cdk code:

const vpc = new aws_ec2.Vpc(construct, APIVpc-${env}`, {
    maxAzs: 1 // Default is all AZs in region
  });

  const cluster = new ecs.Cluster(construct, `APIFargateCluster-${env}`, {
    clusterName: `APIFargateCluster-${env}`,
    containerInsights: true,
    vpc
  });

  // Create ECR - This will hold all the docker images
  const repository = new ecr.Repository(construct, `ECRRepo-${env}`, {
    repositoryName: `_ecr_repo_${env}`,
    removalPolicy: RemovalPolicy.DESTROY
  });

  const hostedZone = aws_route53.HostedZone.fromHostedZoneAttributes(
    construct,
    'APIHostedZone',
    {
      hostedZoneId: '....', 
      zoneName: '.....
    }
  );

  const certificate = new aws_certificatemanager.Certificate(construct, `Cert-${env}`, {
    domainName: 'api.com',
    subjectAlternativeNames: ['*.api.com'],
    validation: aws_certificatemanager.CertificateValidation.fromDns(hostedZone) // Records must be added manually,
  });

  const ecrPolicy = new aws_iam.Policy(construct, `ECRPolicy-${env}`, {
    policyName: 'ECRPolicyName',
    statements: [
      new aws_iam.PolicyStatement({
        actions: [
          'ecr:GetAuthorizationToken',
          'ecr:BatchCheckLayerAvailability',
          'ecr:GetDownloadUrlForLayer',
          'ecr:GetRepositoryPolicy',
          'ecr:ListImages',
          'ecr:DescribeRepositories',
          'ecr:DescribeImages',
          'ecr:BatchGetImage',
          'logs:*',
          'secretsmanager:*',
          'sqs:*'
        ],
        resources: ['*'] // You can restrict resources if needed
      })
    ]
  });

  const ecsTaskRole = new aws_iam.Role(construct, `ECSTaskRole-${env}`, {
    roleName: `TaskDefinitionRole-${env}`,
    assumedBy: new aws_iam.ServicePrincipal('ecs-tasks.amazonaws.com')
  });

  ecsTaskRole.attachInlinePolicy(ecrPolicy);

  // Create a load-balanced Fargate service and make it public
  const service = new ecsPatterns.ApplicationLoadBalancedFargateService(
    construct,
    `APIFargateService-${env}`,
    {
      serviceName: `APIService-${env}`,
      cluster: cluster, // Required
      //redirectHTTP: true,
      certificate: certificate,
      cpu: 256, // Default is 256
      desiredCount: 1, // Default is 1
      circuitBreaker: {
        rollback: true
      },
      loadBalancerName: `APILoanBalancer-${env}`,
      domainName:  '....',
      domainZone: hostedZone,
      taskImageOptions: {
        containerName: `ApiContainer-${env}`,
        image: ecs.ContainerImage.fromRegistry(repository.repositoryUri),
        enableLogging: true,
        environment: {
          ...envVariables
        },
        taskRole: ecsTaskRole,
        executionRole: ecsTaskRole
      },
      memoryLimitMiB: 512, // Default is 512
      publicLoadBalancer: true // Default is true,
    }
  );

  service.targetGroup.configureHealthCheck({
    path: '/health-check'
  });

  return service;
1 回答
1
已接受的回答

Hi Brennan,

This issue usually seems to happen when the ELB Health Check fails even before the container is up and running. This duration is configured with the property "Health Check Grace Period". [1]

I would suggest you consider increasing the grace period from the default value. In CDK, for Health Check, there is a parameter called "startPeriod". [2]

References:

[1] https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-ecs-adds-elb-health-check-grace-period/

[2] https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs.HealthCheck.html

Please let me know if this resolves the issue.

Thanks,

Atul

profile picture
已回答 7 个月前
profile pictureAWS
专家
已审核 7 个月前
  • This was the main issue! Thank you very much. Was able to solve it. I also had to update my Dockerfile so that the startup did not take so long.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容