Load Balancer unhealthy and CDK deploy stuck

0

I have a loadbalancer and a fargate service, which I deploy after another from CDK.

First problem is that the cdk deployment is not going through (see images below).

Second problem which could be the reason for the former is that the health check is failing, although the container health check works and when I set the loadbalancer to be accessible from the internet I could perform my health check and got "OK" with code 200 back.. target health check fails

container is healthy

My steps: 1: loadbalancer

    const relayerLoadBalancer = new elbv2.ApplicationLoadBalancer(
      this,
      "RelayerLoadBalancer",
      {
        vpc: vpc,
        internetFacing: false,
      }
    );

    this.relayerLoadBalancer = relayerLoadBalancer;

2: Taskdefinition and Container from Ecr (see gist)

3: relayer Service with Fargate (also in gist):

    const listener = props.relayerLoadBalancer.addListener("RelayerListener", {
        port: 80,
        protocol: elbv2.ApplicationProtocol.HTTP,
        open: true,
      });
  
      listener.connections.allowFromAnyIpv4(ec2.Port.allTraffic());
  
    const sg_service = new ec2.SecurityGroup(this, "RelayerSG", {
      vpc: vpc,
      allowAllOutbound: true,
      description: "Security group for Relayer tasks",
    });
    sg_service.addIngressRule(
      ec2.Peer.ipv4("0.0.0.0/0"),
      ec2.Port.allTraffic()
    );

    // Create a Fargate service
    const relayerService = new ecs.FargateService(this, "RelayerServiceM", {
      cluster: cluster,
      taskDefinition: props.taskDefinition,
      enableExecuteCommand: true,
      securityGroups: [sg_service],
      assignPublicIp: false,
      healthCheckGracePeriod: Duration.seconds(3600),
    });

    listener.addTargets("RelayerTarget", {
      port: 80,
      targets: [relayerService],
      healthCheck: {
        path: "/health",
        healthyHttpCodes: "200-499",
      },
    });

Ive tried to split the load balancer and fargate service but the cdk deployment is always stuck (for example: step 4/6) in the step where the service is created:

stuck cdk deployment image

Therefore the deployment in the console is also in progress: aws console image

I think maybe the cdk deploy fails because the health check fails but maybe also the health check fail because the deployment doesnt go through. Sounds confusing and I definitely am.

I've set the healthCheckGracePeriod to 5h, which should be more than enough time for the health checks to come through...

I've read something about that the health check must not be performed on the port 80 but since Ive opened up all traffic this shouldn't be a problem right?

3 Answers
0
Accepted Answer

The amazing dudes from the aws support helped me and after 6h it was solved with one single line in my .env file. My .env defines the HOST and PORT, which were localhost"and 80 respectively. The Dockerfile exposed the PORT and the applications health check was working from inside the container but not from the outside. This meant the ALB couldn't reach the target.

Therefore the solution was to set the HOST to 0.0.0.0 in the .env! This exposed it to the outside.

Marvin
answered 4 months ago
profile picture
EXPERT
reviewed a month ago
0

Hello,

as Hernan suggested, check out the Cloudformation Set in the AWS Console if you see any error.

Healthchecks on Port 80 are not a problem from loadbalancer.

Pretty sure the cdk(cloudformation) deployment fails because of the failing healthcheck.

Check out the target group in aws ec2 console if the ecs-tasks are registered correctly. But since you wrote that you are able to get a 200 successfully when connecting from the internet over the alb to the service, maybe the reason could also be that the default values for the target groups health check settings are to low, see following document:

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html

Sincerely heiko

profile picture
HeikoMR
answered 4 months ago
  • Hi Heiko, the Cloudformation output does not show anything informational. I have now exposed port 8080 from my docker container and map it to port 80 from my host, but it doesnt work. Container is healthy. And I have a public loadbalancer and listener but now I get a 502 Bad Gateway when I open the DNS endpoint

  • Hey @Marvin, 502 could be because of bad port mapping or the instance was replaced/down during testing.

    what I would check during deployment via cdk: Check out the ecs-service/tasks & the alb-target group. Check if the ips are correctly registered. Check the healthcheck-settings there, as I said maybe the healthchecks default values for the targetgroup were to low by default for your usecase. Check the security group of the ecs service/task. deploy an ec2 for testing purposes in the same network, connect to it and try to reach the ecs task on the health-path and see if it works. If it doesn't work, either your port mapping does not work or your container has a problem with the healthpath. If it works, probably your target group healtchecksettings have to be updated.

  • Thanks again for the help Heiko. The idea to deploy an ec2 for testing was very valuable to narrow it down on the ECS because we could rule out the ALB as cause of the problem.

0

in the AWS console, go to cloudformation service and check the events of the stack that you are deploying. there you may find the error.

profile picture
answered 4 months ago
  • Unfortunately not. Since it timeouts after 3h I only get "The following resource(s) failed to create: [RelayerServiceMService30B8E9A6]. Rollback requested by user."

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions