Hi,
We are facing quite a weird problem when trying to enable ELB health checks for an autoscaling group that serves as a capacity provider to our ECS cluster. Our setup is as follows:
- An auto-scaling group consisting of two EC2 instances
- An ECS cluster with four task definitions, each configured to use bridge networking and dynamic port mapping
- The ECS cluster has a capacity provider configured for the auto-scaling group. I have double-checked that all the container instances registered in the cluster come exactly from this capacity provider
- Two ALBs, both having targets pointing to the instances from the ASG
NOTE: As long as the ASG is configured to perform only EC2 health checks, the setup described above works flawlessly, including dynamic port mapping and health checks at the ALB level. This, I believe, rules out typical problems with health checks caused by misconfigured security groups.
Though changing the ASG health check type to ELB and adding the target groups from the ALBs wreaks havoc (we tried both Terraform and AWS web console with no difference). As soon as the switch happens, AWS creates new targets in the referenced target groups with an invalid port number not respecting the dynamic port mapping. In other words, if there's a port 8080 on an ECS task container exposed as port 32770 on the EC2 instance thanks to dynamic port mapping, AWS would create a new target with port number 8080 (on which apparently nobody is listening on the EC2 instance). Consequently, all those newly created targets fail the health checks, making ASG to constantly re-create instances, thus bringing the whole ECS cluster to a halt.
Since the AWS docs seems to be silent on this, I would like to confirm if dynamic port mapping is at all a supported scenario for ELB health checks at the ASG level, and, if so, are there any additional required configuration steps we might have missed?
Thanks, Dmytro.
Hi Osvaldo!
Thanks for a detailed follow-up! I am pretty sure we have items 1 through 5 from your list in check (or else we would be observing failed health checks at the ELB level, but in fact the health check rules we deploy through Terraform work as they should).
We also tried a suggestion I found elsewhere on Stackoverflow to explicitly specify "traffic-port" as the port value for the health check rules, but that didn't help either.
Still I would appreciate if you could elaborate on properly configuring the ASG to work with dynamic port mapping, because it's the most likely place where I have a feeling we might have misconfigured something. If you possibly have possible recommendations in mind, I would love to hear them.
As for contacting AWS support, I have a feeling we might end up opening a support case since currently it looks like we've accidentally discovered an unsupported combination of AWS settings (unless of course we didn't mess up with properly configuring the ASG)