Problem with ApplicationLoadBalancedEc2Service not registering with the cluster

0

I have a CDK script that tries to create an ECS cluster, task, and ApplicationLoadBalancedEc2Service, but when I deploy it the Ec2 instance is created, but it is not registered with the LB target group, or registered with the cluster. So my task never starts. It also hangs getting to 65 out of 67 services started and never finishes. I'm thinking that's because the Ec2 instance doesn't get registered with the cluster, and it nothing starts up and the CDK script is waiting for it to go healthy. I also see that my task doesn't have a CPU, OS, or memory assigned to it when I compare that against my ECS JSON configured file it does.

What I don't really get is which part is creating the Ec2 instance? Is it the addCapacity call that declares the specs of the Ec2 machine? If so why isn't it automatically registered with the cluster then? Or is it the ApplicationLoadBalancedEc2Service construct, and in which case why isn't that registered to the cluster with the .cluster() method? And how would it know what Ec2 specs to start up if it's creating it?

Here is my general script:

Vpc vpc = Vpc.Builder.create(this, "MyAppVpc")
        .maxAzs(2)  // Default is all AZs in region
        .natGateways(1)
        .enableDnsHostnames(true)
        .enableDnsSupport(true)
        .ipAddresses(IpAddresses.cidr("10.0.0.0/16"))
        .subnetConfiguration(Arrays.asList(
                SubnetConfiguration.builder()
                        .name("Web")
                        .subnetType(SubnetType.PUBLIC)
                        .cidrMask(24)
                        .build(),
                SubnetConfiguration.builder()
                        .name("App")
                        .subnetType(SubnetType.PRIVATE_WITH_EGRESS)
                        .cidrMask(24)
                        .build(),
                SubnetConfiguration.builder()
                        .name("Storage")
                        .subnetType(SubnetType.PRIVATE_WITH_EGRESS)
                        .cidrMask(24)
                        .build()
        ))
        .build();

vpc.addGatewayEndpoint("s3Endpoint", GatewayVpcEndpointOptions.builder()
        .service(GatewayVpcEndpointAwsService.S3)
        .build());
vpc.addInterfaceEndpoint("ssmEndpoint", InterfaceVpcEndpointOptions.builder()
        .service(InterfaceVpcEndpointAwsService.SSM)
        .privateDnsEnabled(true)
        .build());
vpc.addInterfaceEndpoint("cloudWatchEndpoint", InterfaceVpcEndpointOptions.builder()
        .service(InterfaceVpcEndpointAwsService.CLOUDWATCH)
        .privateDnsEnabled(true)
        .build());
vpc.addInterfaceEndpoint("ec2MessagesEndpoint", InterfaceVpcEndpointOptions.builder()
        .service(InterfaceVpcEndpointAwsService.EC2_MESSAGES)
        .privateDnsEnabled(true)
        .build());

ILogGroup appLogGroup = LogGroup.fromLogGroupName(this, "my." + settings.getEnv() + ".archiver", "my." + settings.getEnv() + ".archiver");
if( appLogGroup == null ) {
    appLogGroup = new LogGroup(this, "my." + settings.getEnv() + ".archiver", LogGroupProps.builder()
            .logGroupName("my." + settings.getEnv() + ".archiver")
            .retention(RetentionDays.ONE_YEAR)
            .build());
}

IHostedZone domainZone = HostedZone.fromLookup(this, getStackId(), HostedZoneProviderProps
        .builder()
        .domainName(settings.get("hostname"))
        .build());

Cluster cluster = Cluster.Builder.create(this, "MyAppCluster")
        .containerInsights(true)
        .capacity(AddCapacityOptions.builder()
                .vpcSubnets(SubnetSelection.builder().subnets(vpc.getPrivateSubnets()).build())
                .desiredCapacity(1)
                .minCapacity(1)
                .maxCapacity(1)
                .instanceType(InstanceType.of(InstanceClass.T3A, InstanceSize.LARGE))
                .machineImage(MachineImage.latestAmazonLinux())
                .machineImageType(MachineImageType.AMAZON_LINUX_2)
                .keyName(settings.get("ssh.key.name"))
                .build())
        .vpc(vpc)
        .build();

ICertificate myAppCert = Certificate.fromCertificateArn(this, settings.get("ssl.key.id"), settings.get("ssl.key.arn") );

Ec2TaskDefinition appTask = Ec2TaskDefinition.Builder.create(this, "MyAppTask")
        .family("MyApp")
        .networkMode(NetworkMode.AWS_VPC)
        .executionRole(Role.fromRoleArn(this,
                settings.get("execution.role.id"),
                settings.get("execution.role.arn")))
        .build();

ContainerDefinition container = appTask.addContainer("MyApp-" + settings.getEnv(), ContainerDefinitionOptions.builder()
        .image(ContainerImage.fromRegistry("ecr.repo.url/myApp:1.8.0"))
        .essential(true)
        .cpu(2048)
        .memoryLimitMiB(4096)
        .memoryReservationMiB(1024)
        .portMappings(Arrays.asList(
                portMapping(8443,8443),
                portMapping(8080,8080),
                portMapping(9013,9013),
                portMapping(9012,9012)
        ))
        .logging(LogDriver.awsLogs(AwsLogDriverProps.builder()
                .logGroup(appLogGroup)
                .streamPrefix("MyApp-" + settings.getEnv() + "-")
                .build()))
        .build());

ApplicationLoadBalancedEc2Service ec2 = ApplicationLoadBalancedEc2Service.Builder
        .create(this, "MyAppService")
        .cluster(cluster)   // I thought this point of this was to allow the Ec2 instance to registered with the cluster?!
        .taskDefinition(appTask)
        .cpu(2048)
        .memoryReservationMiB(1024)
        .memoryLimitMiB(16000 )
        .desiredCount(1)
        .domainName(settings.get("hostname"))
        .domainZone(domainZone)
        .certificate( myAppCert )
        .redirectHttp(true)
        .protocol(ApplicationProtocol.HTTPS)
        .targetProtocol(ApplicationProtocol.HTTPS)
        .publicLoadBalancer(true)
        .build();

ec2.getTargetGroup().configureHealthCheck(HealthCheck.builder()
        .path("/health-check.txt")
        .interval(Duration.minutes(2))
        .enabled(true)
        .unhealthyThresholdCount(5)
        .port("8443")
        .protocol(Protocol.HTTPS)
        .build());
2 Answers
1
Accepted Answer

So from the code you can see I was using the default amazon linux image. It is not a custom image or special AMI.

.machineImage(MachineImage.latestAmazonLinux())

But that appears to NOT be compatible with ECS cluster requirements. The fix was simple:

.machineImage(EcsOptimizedImage.amazonLinux2())

Switching the EcsOptimizedImage object I can spin up an ECS image that matches the requirements. Once I deployed that I saw it register with the cluster and my image started up. There are plenty of other issues I have to fix. It's still not working, but at least it's registering with the cluster now.

answered a year ago
profile picture
EXPERT
reviewed a month ago
0

Hello,

Warm Greetings from AWS Support.

I am Tejas from the AWS Support Team.

I have read your query and I understand that you have a CDK script which is trying to create an ECS Cluster , task and an application loadbalancer.

You have mentioned that is not registered with load balancer target group and the task never starts.

You are also suspecting that the EC2 instance does not gets registered with the cluster and the cdk scripts keeps waiting for it get healthy.

You have also asked the following queries as to which part is creating the ec2 instance and if the addCapacity is declaring the specification of the Ec2 instance and further follow up questions related to the instance being launched.

I have checked your cdk script and I would request you to check the following things mentioned below:-

=============================================

I would like to start by answering your query, The instance is being created in the cluster (). I noticed that the AMI used in the code is mentioned as " AMAZON_LINUX_2 ".

Answering your next query. Yes, The addcapacity call is the one which declares the specification of the instance.

I would request you to Check the AMI of your instance

If the AMI that you use for the EC2 instance is a copied or custom AMI, then confirm that the instance has the following components:

  • A modern Linux distribution running at least version 3.10 of the Linux kernel.

  • Latest version of the Amazon ECS Linux container agent.

  • A Docker daemon running at least version 1.9.0 and any Docker runtime dependencies (from the Docker website). To view the current

Docker version, run the command sudo docker version. For information about installing the latest Docker version on your particular Linux distribution, see Docker documentation for Install Docker engine on the Docker website.

[NOTE] The Amazon ECS optimized AMIs [+] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html are preconfigured with these requirements. It's a best practice to use Amazon ECS optimized AMIs unless your application requires version that's not yet available in that AMI.

===============================================

I have read your script and compared your script with a script on github. I am pasting the link for the gitbhub script for your reference as well

[+]https://github.com/aws-samples/aws-cdk-examples/blob/master/typescript/ecs/ecs-service-with-advanced-alb-config/index.ts

[Note] On comparing the script , I have found that the load balancer is missing the listener in your script.

I have then followed the documentation for python as well [3] https://docs.aws.amazon.com/cdk/api/v2/python/aws_cdk.aws_ecs/README.html#clusters

Please check the section " Include an application/network load balancer " the below listener is missing from your code.

listener = lb.add_listener("Listener", port=80)

Please refer to documentation [3] for more details.

I would like to inform you that the Your Amazon EC2 instance can't register with or join an Amazon ECS cluster because of one or more of the following reasons:

The ECS endpoint can't access the Domain Name System (DNS) hostname of the instance publicly.

Your public subnet configurations are incorrect.

Your private subnet configurations are incorrect.

Your VPC endpoints are incorrectly configured.

Your security groups don't allow network traffic.

The EC2 instance doesn't have the required AWS Identity and Access Management (IAM) permissions. Or, the ecs:RegisterContainerInstance API call is denied.

The instance user data for your ECS container is incorrectly configured.

The ECS agent is stopped or not running on the instance.

The launch configuration of the Auto Scaling group isn't correct (if your instance is part of an Auto Scaling group).

The Amazon Machine Image (AMI) that you use for your instance doesn't meet the prerequisites.

Below is a detailed documentation which will provide details of the all the above points which are mentioned above.

IMP [+] https://repost.aws/knowledge-center/ecs-instance-unable-join-cluster

You can also debug more by checking the relevant api calls in cloudtrail where resources are present in your account.

I hope the above information helps. In case you feel I missed out to address something more to your concern, or if we can otherwise provide any additional assistance with regard to this matter, please do not hesitate to let me know, I’ll be more than happy to assist you.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions