I have a Fargate + ECS service, with an NLB with a TCP listener in 443 port and a TCP test listener in port 9443. We use NLB with TCP to do TLS termination the hosts (containers).
I also have a second Target Group for blue/green deployments. All target types are setup to IPv4, and the service is working as expected outside deployments.
I've run the following experiments:
- When I run my integration tests outside of a deployment (both listeners are pointing to the same target group) all tests pass against both listeners (443 and 9443).
- When I run them in the context of a deployment, in the AfterAllowTraffic hook (both listeners pointing to the replacement target group), all tests pass against both listeners (443 and 9443).
- When I run the tests in the context of a deployment, in the AfterAllowTestTraffic hook, after I checked that listener 443 points to the blue target group, and the listener 9443 points to the green target group with a healthy container, neither of them pass, they fail to establish connection. However, If I run the tests directly against the container instances by targeting their IP, then all tests pass.
- If I manually replicate the blue/green deployment setup, and point the test listener to other target group, so listener in 443 keeps pointing at a target group and then listener 9443 points to another target group, then both listeners STOP WORKING!
- If in the experiment #4, I delete the listener on 9443, so there is only one listener in 443 targeting the blue target group, then it starts working again.
Is this a misconfiguration on my side? It seems likely this is a problem in AWS-NLB side?