- Newest
- Most votes
- Most comments
Based on the symptoms you're describing—ENA link flaps, DHCP failures, loss of connectivity, high load averages with D-state tasks, and kworker/ENA activity—this does appear to be an ENA driver or kernel compatibility issue rather than a typical configuration problem.
The ENA device uses a keep-alive mechanism to monitor device health. When keep-alive messages aren't received, the driver initiates a reset procedure to recover from failures. During this reset, there can be brief traffic loss, though TCP connections should typically recover. The reset process involves the driver logging statistics, discarding incomplete packets, and reinitializing the device. However, if the underlying issue isn't resolved, these resets may not successfully restore connectivity.
For instances built on the AWS Nitro System (which includes m7g instances), specific ENA driver versions are recommended. The ENA Linux kernel driver version 2.2.9g or later is recommended for Nitro v4 instance types and required for Nitro v5 or later. Some Linux distributions may require minimum kernel versions to prevent suboptimal performance of ENA driver features on Nitro v4 and newer instance types. While Amazon Linux 2023 and Bottlerocket support ENA features for Nitro v4 and newer by default, other distributions' timing for incorporating ENA driver features can vary.
The fact that downgrading to kernel 5.15.0 resolved your issue strongly suggests a regression in the 6.8 kernel's ENA implementation or its interaction with the Graviton-based Nitro hardware. This could be related to how the 6.8 kernel version handles ENA driver features, device resets, or keep-alive mechanisms.
For issues like this that appear to be kernel or driver-related rather than configuration issues, and especially when they may be specific to certain instance types, kernel versions, or availability zones, escalation to AWS support would be appropriate. Even without a paid support plan, you can report potential service issues through AWS forums or by opening a case for service limit increases (which is available on free tier) and describing the technical issue there. You might also consider checking the Ubuntu kernel bug tracker or AWS-specific kernel repositories for any reported issues with the 6.8.0-1044-aws kernel and ENA on Graviton instances.
Sources
Troubleshoot the ENA kernel driver on Linux - Amazon Elastic Compute Cloud
Instances built on the AWS Nitro System - Amazon EC2
Relevant content
- asked 6 years ago
- AWS OFFICIALUpdated 3 years ago
