Switching autoscaling group health check to ELB when using ECS dynamic port mapping creates unwanted registered targets that fail the health check

0

Hi,

We are facing quite a weird problem when trying to enable ELB health checks for an autoscaling group that serves as a capacity provider to our ECS cluster. Our setup is as follows:

  • An auto-scaling group consisting of two EC2 instances
  • An ECS cluster with four task definitions, each configured to use bridge networking and dynamic port mapping
  • The ECS cluster has a capacity provider configured for the auto-scaling group. I have double-checked that all the container instances registered in the cluster come exactly from this capacity provider
  • Two ALBs, both having targets pointing to the instances from the ASG

NOTE: As long as the ASG is configured to perform only EC2 health checks, the setup described above works flawlessly, including dynamic port mapping and health checks at the ALB level. This, I believe, rules out typical problems with health checks caused by misconfigured security groups.

Though changing the ASG health check type to ELB and adding the target groups from the ALBs wreaks havoc (we tried both Terraform and AWS web console with no difference). As soon as the switch happens, AWS creates new targets in the referenced target groups with an invalid port number not respecting the dynamic port mapping. In other words, if there's a port 8080 on an ECS task container exposed as port 32770 on the EC2 instance thanks to dynamic port mapping, AWS would create a new target with port number 8080 (on which apparently nobody is listening on the EC2 instance). Consequently, all those newly created targets fail the health checks, making ASG to constantly re-create instances, thus bringing the whole ECS cluster to a halt.

Since the AWS docs seems to be silent on this, I would like to confirm if dynamic port mapping is at all a supported scenario for ELB health checks at the ASG level, and, if so, are there any additional required configuration steps we might have missed?

Thanks, Dmytro.

1 Answer
1

Hello,

The issue you're facing is likely due to ELB health checks not correctly recognizing dynamically mapped ports in your ECS setup. To address this:

  1. Check the health check configurations in your Target Groups.
  2. Verify ECS service registration with the target groups.
  3. Ensure security groups allow health check traffic.
  4. Review ALB listener rules for correct traffic routing.
  5. Confirm ECS agent configuration supports dynamic port mapping.

The goal is to ensure all components (ASG, ELB, ECS, EC2) are properly configured to work with dynamic port mapping.

Also, you can follow this guideline to set up dynamic port mapping in ECS: Dynamic Port Mapping in ECS

profile picture
EXPERT
answered 2 months ago
  • Hi Osvaldo!

    Thanks for a detailed follow-up! I am pretty sure we have items 1 through 5 from your list in check (or else we would be observing failed health checks at the ELB level, but in fact the health check rules we deploy through Terraform work as they should).

    We also tried a suggestion I found elsewhere on Stackoverflow to explicitly specify "traffic-port" as the port value for the health check rules, but that didn't help either.

    Still I would appreciate if you could elaborate on properly configuring the ASG to work with dynamic port mapping, because it's the most likely place where I have a feeling we might have misconfigured something. If you possibly have possible recommendations in mind, I would love to hear them.

    As for contacting AWS support, I have a feeling we might end up opening a support case since currently it looks like we've accidentally discovered an unsupported combination of AWS settings (unless of course we didn't mess up with properly configuring the ASG)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions