Short description
The bw_in_allowance_exceeded, bw_out_allowance_exceeded, or pps_allowance_exceeded Elastic Network Adaptor (ENA) network performance metrics might increase even when your average utilization is low. The most common cause of this issue are short spikes in demand for network resources, called microbursts. Microbursts typically only last for seconds, milliseconds, or even microseconds. Amazon CloudWatch metrics aren't granular enough to reflect them. For example, you can use the NetworkIn and NetworkOut instance metrics in CloudWatch to calculate the average throughput per second. However, the calculated rates might be lower than the available instance bandwidth for the instance type because of microbursts.
An increase in bw_in_allowance_exceeded and bw_out_allowance_exceeded metrics also occurs on smaller instances that have an "up to" bandwidth, such as "up to 10 gigabits per second (Gbps)." The smaller instances use network I/O credits to burst beyond their baseline bandwidth for a limited time. When the credits are depleted, the traffic aligns to the baseline bandwidth and the metrics increase. Because instance burst occurs on a best-effort basis, the metrics might increase even when your instance has available I/O credits.
An increase in the pps_allowance_exceeded metric also occurs when non-optimal traffic patterns cause packet drops at lower PPS rates. Asymmetric routing, outdated drivers, small packets, fragments, and connection tracking affect the PPS performance for a workload.
Resolution
Average calculation
CloudWatch samples Amazon EC2 metrics every 60 seconds to capture the total bytes or packets that are transferred in 1 minute. Amazon EC2 aggregates the samples and publishes them to CloudWatch in 5-minute periods. Each statistic in the period shows a different value.
When you use detailed monitoring, CloudWatch publishes the NetworkIn and NetworkOut metrics without aggregation in 1-minute periods. The values for Sum, Minimum, Average, and Maximum are the same, and the value for SampleCount is 1. CloudWatch always aggregates and publishes the NetworkPacketsIn and NetworkPacketsOut metrics in 5-minute periods.
Use the following methods to calculate the average throughput in bytes per second (Bps) or PPS in a period:
- For a simple average in your specified time period, divide Sum by Period or by the timestamp difference between values (DIFF_TIME).
- For an average in the minute with the highest activity, divide Maximum by 60 seconds.
To convert Bps into Gbps, divide the calculation results by 1,000,000,000 bytes, and then multiply them by 8 bits.
Microbursts in CloudWatch metrics
The following example shows how a microburst appears in CloudWatch. The instance has a network bandwidth allowance of 10 Gbps and uses basic monitoring.
In a sample of 60 seconds, an outbound data transfer of approximately 24 GB uses all available bandwidth. The data transfer increases the bw_out_allowance_exceeded value and completes in approximately 20 seconds with an average speed of 9.6 Gbps. Amazon EC2 doesn't send any other data, and the instance remains idle for the remaining 4 samples of 240 seconds.
The average throughput in Gbps in a 5-minute period is much lower than the one during the microburst:
Formula: AVERAGE_Gbps = SUM(NetworkOut) / PERIOD(NetworkOut) / 1,000,000,000 bytes * 8 bits
SUM(NetworkOut) = (~24 GB * 1 sample) + (~0 GB * 4 samples) = ~24 GB
PERIOD(NetworkOut) = 300 seconds (5 minutes)
AVERAGE_Gbps = ~24 / 300 / 1,000,000,000 * 8 = ~0.64 Gbps
Even when you calculate the average throughput based on the highest sample, the amount still doesn't reflect the throughput during the microburst:
Formula: AVERAGE_Gbps = MAXIMUM(NetworkOut) / 60 seconds / 1,000,000,000 bytes / 8 bits
MAXIMUM(NetworkOut) = ~24 GB
AVERAGE_Gbps = ~24 GB / 60 / 1,000,000,000 * 8 = ~3.2 Gbps
When high-resolution data is available, you can get more accurate averages. When you collect operating system (OS) network usage metrics at 1-second intervals, the average throughput briefly reaches approximately 9.6 Gbps.
Monitor microbursts
You can use the CloudWatch agent on Linux and Windows to publish OS-level network metrics to CloudWatch at up to 1-second intervals. The agent can also publish ENA network performance metrics.
Note: High-resolution metrics have higher pricing.
You can also use OS tools to monitor network statistics at up to 1-second intervals. For Windows instances, use Performance Monitor. For Linux, use sar, nload, iftop, iptraf-ng, or netqtop.
To clearly identify microbursts, perform a packet capture of the OS, and then use Wireshark to plot an I/O graph at 1-millisecond intervals. For more information, see Download Wireshark and 8.8. The "I/O Graphs" window on the Wireshark website.
This method has the following limitations:
- Network allowances are approximately proportionate at a microsecond level. For example, an instance type with a 10 Gbps bandwidth performance can send and receive about 10 megabits (Mb) in 1 millisecond.
- Packet captures cause additional system load and might reduce the overall throughput and PPS rates.
- Packet captures might not include all packets because of packet drops that a full buffer caused.
- Timestamps don't accurately reflect when a network sent packets or when the ENA received them.
- The I/O graphs might show lower activity for inbound traffic because Amazon EC2 shapes traffic that exceeds its quota before it reaches the instance.
Packet queues and drops
When the network queues a packet, the resulting latency is measured in milliseconds. TCP connections can scale their throughput and exceed the quotas of an EC2 instance type. As a result, some packet queues are expected even when you use bottleneck bandwidth and round trip (BBR) or other congestion control algorithms that use latency as a signal. When a network drops a packet, the TCP automatically retransmits lost segments. Both packet queues and drops can result in higher latency and lower throughput. However, you can't view recovery actions. Typically, the only errors that you can view are when your application uses low timeouts, or when the network drops enough packets that the connection is forcibly closed.
The ENA network performance metrics don't differentiate between queued packets or dropped packets. To measure connection-level TCP latency on Linux, use the ss or tcprtt commands. To measure TCP retransmissions, use the ss or tcpretrans commands for connection-level statistics, and nstat for systemwide statistics. To download the tcprtt and tcpretrans tools that are part of the BPF Compiler Collection (BCC), see bcc on the GitHub website. You can also use packet captures to measure latency and retransmissions.
Note: Packets that the network dropped because of exceeded instance quotas don't appear in the drop counters for ip or ifconfig.
Prevent microbursts
First, check ENA network performance metrics against your application's key performance indicators (KPIs) to determine the effect of packet queues or drops.
If the KPIs are below a required threshold, or you receive application log errors, then take the following actions to reduce packet queues and drops:
- Scale up: Increase the instance size to an instance that has a higher network allowance. Instance types with an "n", such as C7gn, have higher network allowances.
- Scale out: Spread traffic across multiple instances to reduce traffic and contention at individual instances.
For Linux-based operations, you can also implement the following strategies to avoid microbursts. It's a best practice to test the strategies in a test environment to verify that they reduce traffic shaping without negative effects on the workload.
Note: The following strategies are only for outbound traffic.
SO_MAX_PACING_RATE
Use the SO_MAX_PACING_RATE socket option to specify a maximum pacing rate in Bps for a connection. The Linux kernel then introduces delays between packets from the socket so that the throughput doesn't exceed the quota that you specify.
To use this method, you must implement the following changes:
- Application code changes.
- Support from the kernel. For more information, see net: introduce SO_MAX_PACING_RATE on the GitHub website.
- Fair queue (FQ) queuing discipline or the kernel's support for pacing at the TCP layer (for TCP only).
For more information, see getsockopt(2) - Linux manual page and tc-fq(8) - Linux manual page on the man7 website. Also, see tcp: internal implementation for pacing on the GitHub website.
qdiscs
Linux uses the default configuration of a pfifo_fast queuing discipline (qdisc) for each ENA queue to schedule packets. Use the fq qdisc to reduce traffic bursts from individual flow and regulate their throughput. Or, use fq_codel and cake to provide active queue management (AQM) capabilities that reduce network congestion and improve latency. For more information, see the tc(8) - Linux manual page on the man7 website.
For TCP, activate Explicit Congestion Notification (ECN) on clients and servers. Then, combine ECN with a qdisc that can perform ECN Congestion Experienced (CE) marking. CE marks cause the OS to lower the throughput for a connection to reduce latency and packet losses that an exceeded instance quota caused. To use this solution, you must configure the qdisc with a low CE threshold based on the average round-trip time (RTT) of your connections. It's a best practice to use this solution only when the average RTT between connections doesn't vary much. For example, your instance manages traffic only in one Availability Zone.
Because of performance issues, it's not a best practice to set up aggregated bandwidth shaping at the instance level.
Shallow Transmission (Tx) queues
Use shallow Tx queues to reduce PPS shaping. Byte queue limits (BQL) dynamically limits the number of in-flight bytes on Tx queues. To activate BQL, add ena.enable_bql=1 to your kernel command line in GRUB.
Note: You must have ENA driver version 2.6.1g or higher to use this solution. BQL is already activated on ENA drivers with Linux kernel versions that end with K.
For more information, see bql: Byte Queue Limits on the LWN.net website.
When you use ENA Express, you must deactivate BQL to maximize the bandwidth.
You can also use ethtool to reduce the Tx queue length from its default of 1,024 packets. For more information, see ethtool(8) - Linux manual page on the man7 website.
Related information
Amazon EC2 instance network bandwidth