Leverage the following NLB performance/observability metrics to ensure NLB is healthy during the reported timeframe of the incident. The Availability Zone (AZ) dimension can be used to isolate the issue to a specific NLB AZ.
ActiveFlowCount - The total number of concurrent flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED states. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow. A zero/near-zero value indicates
problem with firewall or security group issue restricting traffic versus count in millions is indicative of distributed denial-of-service (DDoS) attack. This metric can also help establishing the typical workload metrics from the application, resulting quick determination of anomalous traffic pattern, if any.
PortAllocationErrorCount - The total number of ephemeral port allocation errors during a client IP translation operation. A non-zero value indicates dropped client connections. Note: Network Load Balancers support 55,000 simultaneous connections or about 55,000 connections per minute to each unique target (IP address and port) when
performing client address translation. To fix port allocation errors, add more targets to the target group.
UnHealthyHostCount - The number of targets that are considered unhealthy. This metric does not include any Application Load Balancers registered as targets. The unhealthy host count metric gives the aggregate number of failed hosts. This metric indicates unhealthy targets for the load balancer
TCP_Client_Reset_Count - The total number of reset (RST) packets sent from a client to a target. These resets are generated by the client and forwarded by the load balancer.
TCP_ELB_Reset_Count - If a client or a target sends data after the idle timeout period elapses, it receives a TCP RST packet to indicate that the connection is no longer valid. Additionally, if a target becomes unhealthy, the load balancer sends a TCP RST for packets received on the client connections associated with the target, unless the
unhealthy target triggers the load balancer to fail open.
TCP_Target_Reset_Count - The total number of reset (RST) packets sent from a target to a client. These resets are generated by the target and forwarded by the load balancer.
Note: TCP_Target_Reset_Count
is an ELB metric published in CloudWatch. This monitors the total number of reset (RST) packets sent from a target (Amazon EC2 host) to a client. A reset packet is one with no payload and with the RST
bit set in the TCP header flags. These resets are generated by the target and forwarded by the load
balancer. Sum is the most useful statistic for this metric. Similarly, the NLB also emits metrics corresponding to resets generated by the load balancer itself (TCP_ELB_Reset_Count)
and resets generated by the client (TCP_Client_Reset_Count)
. For more details please look here