AWS NLB sends 120 byte of unknown TCP Keep-alive packet to clients

0

We have an IoT device that connects to our MQTT broker behind the NLB. We are keeping the connection between IoT device and broker by utilising MQTT Keep Alive time and brokers heartbeat intervals.

Our IoT device sleeps most of the time. It wakes up in the following situations.

Whenever it wants to send PINQREST(every 340s -MQTT Keep Alive time) sends it to the broker. Other microservices publish some data, and brokers send that information to IoT devices.

Our objective is to sleep the IoT device as much as possible and maintain the connection to save the battery.

***Problem: ***Normally, this particular IoT device sleeps most of the time. Our objective is to keep it sleeping as much as possible while maintaining a connection between IoT Device and the MQTT broker.

The problem is that IoT Device continuously wakes up every 20s whenever the broker sends some downstream data to the IoT device. This usually happens whenever IoT Device receives downstream data from a broker.

Based on our vendor's packet analysis, we found that NLB sends 120 bytes of TCP Keep-alive packets to IoT devices every 20s right after the broker publishes some downstream data. This is entirely sent by NLB and not by the broker.

***Only happen in TLS : ***We found that this happens if we use TLS(8883) in NLB and terminate the TLS in NLB. If we remove the TLS, add the listener on a non-secure port (1883), and forward the traffic to Target's non-secure port, things are working as expected, and there are no 20s wake-up or keep-alive packet sent by NLB every 20s.

We also tested the same setup with CLB in an SSL port. It works without any problem and does not send a keep-alive to the client (IoT device).

We have removed the TLS and opened the non-secure port as a temporary workaround.

Why does NLB send keep-alive packets every 20s if we use TLS ? is this an intended behaviour of NLB? Any idea how we could resolve it?

***The overview of the cloud setup: ***

  • MQTT broker runs in ECS Fargate
  • Multi-AZ
  • Broker in a private subnet NLB is in between
  • Client (IoT device) and Target(MQTT Broker)
  • NLB idle time keep resetting by two things
  • Keep alive time sent by Client(IoT device) every
  • 340s Heartbeat time published by Target (MQTT Broker)every 340s Connection remains open NLB offload the TLS in port 8883 and forward the traffic to target port 1883
2 Answers
0

I can't find any documentation on the behaviour of NLB for KeepAlive, except for a mention here https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout

The difference between TCP and TLS, it's that in TLS the communication is encrypted from your client to the NLB, and then a new connection with encryption happens between NLB and the server (there is SSL termination at LoadBalancer), whilst in the case of TCP the SSL is not terminated at load balancer but the request is sent as received directly to the server, and then it will be the server to decrypt the request and the SSL handshake is done directly between the client and the server.

I think that is the reason why the behaviour is different, as for TLS, you need an SSL handshake between client and NLB, and the keepAlive from NBL for that reason.

Miki
answered 2 years ago
  • Thanks, Miki, for the answer. Yes, it is not documented. Besides, as you already noted, this happens on TLS. However, my question is why the TLS connection in NLB should send a Keepalive time to the Client while the Client already sends a Keepalive to keep the connection. I am just wondering am i experiencing this behaviour due to some bug in NLB ?

0

Please check our updated at document https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout

When a TLS listener receives a TCP keepalive packet from either a client or a target, the load balancer generates TCP keepalive packets and sends them to both the front-end and back-end connections every 20 seconds. You can't modify this behavior.

It explained your question and difference behavior between TCP and TLS listener in your IoT device .

profile pictureAWS
Mark_W
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions