Skip to content

EC2 Docker Rust application disconnects after ~8300 concurrent TCP connections

0

Hello,

We are running a Rust-based TCP application inside a Docker container on an AWS EC2 instance (c7a.2xlarge). The application listens on port 9001, which is exposed from the container.

Problem description

When a developer runs a load test (bots connecting via TCP) from an external machine to the EC2 public IP and port 9001, connections start failing after approximately 8300 concurrent users. New connections are rejected and existing clients experience disconnects (timeouts).

However, when the same Rust application is started locally on the developer’s machine, the system can handle 30,000 concurrent connections without issues.

What we have already checked and configured

File descriptor limits (host and container):

ulimit -n
1048576

Inside the container:

docker exec -ti server sh
/app/target/release # ulimit -n
1048576

Kernel parameters on the EC2 host (/etc/sysctl.conf):

fs.file-max = 1048576

net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 262144
net.netfilter.nf_conntrack_max = 262144
net.core.netdev_max_backlog = 50000

net.ipv4.ip_local_port_range = 12000 65535

All changes were applied and verified at runtime.

Additional observations

The EC2 instance has sufficient CPU and memory headroom during the test.

The application process is not hitting the open file descriptor limit.

The issue reproduces consistently around ~8300 concurrent connections.

The problem does not reproduce outside AWS.

Questions

Are there any AWS EC2, ENI, or VPC-level limits that could cap the number of concurrent inbound TCP connections to a single instance?

Could this be related to connection tracking, ENI limits, or networking behavior specific to Nitro-based instances?

Are there additional kernel, network, or AWS-specific settings we should verify to support 10k+ concurrent TCP connections?

Any guidance on where to look next or how to validate AWS-side limits would be greatly appreciated.

Thank you.

asked 6 days ago37 views
1 Answer
1

Based on your symptoms, this is very likely related to EC2 security group connection tracking limits rather than application or kernel configuration issues.

The Core Issue: Connection Tracking Allowance

EC2 instances have a maximum connection tracking allowance that varies by instance type. When your instance exceeds this allowance, you'll experience exactly what you're describing: packet drops and inability to establish new connections around a specific threshold. The ~8,300 connection limit you're hitting strongly suggests you've reached your instance's connection tracking capacity.

Security groups are stateful and automatically track connections to manage traffic flow. Each tracked connection consumes memory and compute resources on the instance. For c7a.2xlarge instances, you're likely hitting the connection tracking limit for that instance size.

How to Diagnose This

Monitor these Amazon Elastic Network Adapter (ENA) metrics:

  • conntrack_allowance_available: Shows how many tracked connections you can establish before exceeding your allowance
  • conntrack_allowance_exceeded: Indicates if your instance is exceeding its connection tracking capacity

If you see conntrack_allowance_exceeded incrementing or conntrack_allowance_available approaching zero around 8,300 connections, this confirms the issue.

Solutions to Consider

  1. Scale to a larger instance type: Larger instances have higher connection tracking allowances and can support more concurrent connections.

  2. Configure security group rules to avoid tracking: You can create security group rules that allow all traffic (0.0.0.0/0) in both directions for specific ports. These become "untracked connections" and don't consume connection tracking resources. However, this reduces security granularity.

  3. Adjust TCP established timeout values: If you have stale connections, reducing the TCP established idle timeout on your Elastic Network Interface can help free up tracking resources faster. You can configure this when creating/modifying network interfaces or in launch templates.

  4. Use Network ACLs instead of security groups where possible, as they don't perform connection tracking.

The fact that this works fine on your local machine but fails consistently at the same threshold on EC2 strongly points to an AWS infrastructure limit rather than an application issue. Your kernel tuning and file descriptor limits appear properly configured, but those don't override the EC2 connection tracking limits imposed at the hypervisor level on Nitro instances.
Sources
Amazon EC2 security group connection tracking - Amazon Elastic Compute Cloud
Troubleshoot security group connection tracking | AWS re:Post

answered 6 days ago
AWS
EXPERT
reviewed 5 days ago
  • I'm create wide rule in security group in and out any ports from 0.0.0.0/0 and check ENI (see below), but problem not solved.

    ethtool -S enp39s0
    NIC statistics:
         tx_timeout: 0
         suspend: 0
         resume: 0
         wd_expired: 0
         interface_up: 2
         interface_down: 1
         admin_q_pause: 0
         reset_fail: 0
         bw_in_allowance_exceeded: 0
         bw_out_allowance_exceeded: 0
         pps_allowance_exceeded: 0
         conntrack_allowance_exceeded: 0
         linklocal_allowance_exceeded: 0
         conntrack_allowance_available: 547252
    

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.