Load Balancer unhealthy and CDK deploy stuck

0

I have a loadbalancer and a fargate service, which I deploy after another from CDK.

First problem is that the cdk deployment is not going through (see images below).

Second problem which could be the reason for the former is that the health check is failing, although the container health check works and when I set the loadbalancer to be accessible from the internet I could perform my health check and got "OK" with code 200 back.. target health check fails

container is healthy

My steps: 1: loadbalancer

    const relayerLoadBalancer = new elbv2.ApplicationLoadBalancer(
      this,
      "RelayerLoadBalancer",
      {
        vpc: vpc,
        internetFacing: false,
      }
    );

    this.relayerLoadBalancer = relayerLoadBalancer;

2: Taskdefinition and Container from Ecr (see gist)

3: relayer Service with Fargate (also in gist):

    const listener = props.relayerLoadBalancer.addListener("RelayerListener", {
        port: 80,
        protocol: elbv2.ApplicationProtocol.HTTP,
        open: true,
      });
  
      listener.connections.allowFromAnyIpv4(ec2.Port.allTraffic());
  
    const sg_service = new ec2.SecurityGroup(this, "RelayerSG", {
      vpc: vpc,
      allowAllOutbound: true,
      description: "Security group for Relayer tasks",
    });
    sg_service.addIngressRule(
      ec2.Peer.ipv4("0.0.0.0/0"),
      ec2.Port.allTraffic()
    );

    // Create a Fargate service
    const relayerService = new ecs.FargateService(this, "RelayerServiceM", {
      cluster: cluster,
      taskDefinition: props.taskDefinition,
      enableExecuteCommand: true,
      securityGroups: [sg_service],
      assignPublicIp: false,
      healthCheckGracePeriod: Duration.seconds(3600),
    });

    listener.addTargets("RelayerTarget", {
      port: 80,
      targets: [relayerService],
      healthCheck: {
        path: "/health",
        healthyHttpCodes: "200-499",
      },
    });

Ive tried to split the load balancer and fargate service but the cdk deployment is always stuck (for example: step 4/6) in the step where the service is created:

stuck cdk deployment image

Therefore the deployment in the console is also in progress: aws console image

I think maybe the cdk deploy fails because the health check fails but maybe also the health check fail because the deployment doesnt go through. Sounds confusing and I definitely am.

I've set the healthCheckGracePeriod to 5h, which should be more than enough time for the health checks to come through...

I've read something about that the health check must not be performed on the port 80 but since Ive opened up all traffic this shouldn't be a problem right?

3개 답변
0
수락된 답변

The amazing dudes from the aws support helped me and after 6h it was solved with one single line in my .env file. My .env defines the HOST and PORT, which were localhost"and 80 respectively. The Dockerfile exposed the PORT and the applications health check was working from inside the container but not from the outside. This meant the ALB couldn't reach the target.

Therefore the solution was to set the HOST to 0.0.0.0 in the .env! This exposed it to the outside.

Marvin
답변함 5달 전
profile picture
전문가
검토됨 한 달 전
0

Hello,

as Hernan suggested, check out the Cloudformation Set in the AWS Console if you see any error.

Healthchecks on Port 80 are not a problem from loadbalancer.

Pretty sure the cdk(cloudformation) deployment fails because of the failing healthcheck.

Check out the target group in aws ec2 console if the ecs-tasks are registered correctly. But since you wrote that you are able to get a 200 successfully when connecting from the internet over the alb to the service, maybe the reason could also be that the default values for the target groups health check settings are to low, see following document:

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html

Sincerely heiko

profile picture
HeikoMR
답변함 5달 전
  • Hi Heiko, the Cloudformation output does not show anything informational. I have now exposed port 8080 from my docker container and map it to port 80 from my host, but it doesnt work. Container is healthy. And I have a public loadbalancer and listener but now I get a 502 Bad Gateway when I open the DNS endpoint

  • Hey @Marvin, 502 could be because of bad port mapping or the instance was replaced/down during testing.

    what I would check during deployment via cdk: Check out the ecs-service/tasks & the alb-target group. Check if the ips are correctly registered. Check the healthcheck-settings there, as I said maybe the healthchecks default values for the targetgroup were to low by default for your usecase. Check the security group of the ecs service/task. deploy an ec2 for testing purposes in the same network, connect to it and try to reach the ecs task on the health-path and see if it works. If it doesn't work, either your port mapping does not work or your container has a problem with the healthpath. If it works, probably your target group healtchecksettings have to be updated.

  • Thanks again for the help Heiko. The idea to deploy an ec2 for testing was very valuable to narrow it down on the ECS because we could rule out the ALB as cause of the problem.

0

in the AWS console, go to cloudformation service and check the events of the stack that you are deploying. there you may find the error.

profile picture
답변함 5달 전
  • Unfortunately not. Since it timeouts after 3h I only get "The following resource(s) failed to create: [RelayerServiceMService30B8E9A6]. Rollback requested by user."

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠