NLB target groups not acting consistently?

0

Hi, I'm using terraform to create infrastructure for two environments: develop and production. Both environments consist of a self-hosted kubernetes cluster on EC2 instances, and a self-managed database on an EC2 instance.

The develop env has all these in private subnets behind a NAT GW and a network load balancer. There are three target groups, one for http, one for https traffic pointing to the cluster and one for the protocol of the database. There are a few Route53 alias records pointing to our network load balancer and the target groups are associated to the right auto scaling groups. The cluster and the database is reachable from the public internet (this is intentional for the time being). This setup works very well.

The problem is when I tried to reproduce the same setup for the production environment, the database was sometimes unreachable, more not than yes, and when it wasn't, the connection was just hanging. The only thing different are the names, like environment name etc., the configuration is pretty much the same. I can't figure out why it works in one case and not in the other.

I've disabled cross-zone load balancing on both load-balancers so when I execute the dig command on the develop database record, I only get one IP address as it should be because of the disabled setting. But that isn't the case with the production NLB as I get 3, as much as the number of associated subnets. It's as though the cross load balancing setting is on even if it says it isn't.

Has anyone experienced inconsistent behavior like this?

In the end, I had to disassociate the production database from the production NLB target group, put it in a public subnet and create an A record just for it.

Rocky
asked 2 years ago294 views
1 Answer
-1

One reason why your traffic was not working might be because you had cross-zone load balancing disabled and the database was not in the same AZ as the NLB. NLB forwards traffic only within the same AZ where the endpoint is, if the cross-zone option is disabled. Thus, it might seem that connection is established as it goes to NLB but the NLB can't connect to backend resources.

The amount of IP addresses you get back with dig-command depends on the amount of subnets where the NLB is deployed into. Cross-zone loadbalancing option doesn't affect this.

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#cross-zone-load-balancing

profile pictureAWS
jose
answered 10 months ago
  • Hi, thanks for your comment. I realize how cross-zone load balancing works and agree with what you said however I didn't have it turned off right away. If you read my question, the exact same setup was used in both environments with cross-zone load balancing turned on and in spite of that I observed a different behavior which I can't explain.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions