Skip to content

Troubleshooting EC2 Connectivity Issues Caused by Docker Network CIDR Overlap with Transit Gateway

5 minute read
Content level: Advanced
0

Amazon EC2 customers using EC2 instances with Docker in environments connected via AWS Transit Gateway (TGW) may experience unexpected connectivity failures when Docker’s default network CIDR ranges overlap with VPC or Transit Gateway routes. This article provides a step-by-step troubleshooting guide to help EC2 customers determine whether connectivity issues originate from AWS networking components or from OS-level routing conflicts introduced by Docker.

Common Symptoms

You may encounter one or more of the following symptoms:

  • An EC2 instance is unreachable while another instance in the same subnet is reachable
  • Security groups, route tables, and network ACLs appear identical
  • SSH, ICMP (ping), or application traffic fails intermittently or consistently
  • traceroute does not progress beyond the first hop
  • Connectivity does not recover after stopping and starting the instance
  • Transit Gateway routes appear correct, but traffic still fails

Preliminary AWS-Level Checks (Rule These Out First)

Before investigating Docker or operating system–level networking, validate the following AWS infrastructure components.

EC2 Instance Health

  • Confirm the instance is in a running state
  • Verify 2/2 EC2 status checks are passing

Subnet and Route Table Validation

  • Ensure both the working and non-working EC2 instances are associated with the same subnet
  • Confirm the subnet route table includes:
    • The local VPC CIDR (local)
    • Transit Gateway routes for all required destination CIDRs

Security Group Rules

  • Validate inbound rules allow required traffic (for example, TCP port 22 for SSH)
  • Confirm outbound rules allow return traffic
  • Compare effective security groups between instances

Network ACLs (NACLs)

  • Verify NACLs allow inbound and outbound traffic for the required ports and CIDRs
  • Ensure there are no explicit deny rules affecting traffic

Transit Gateway Configuration

Confirm the following:

  • The VPC is attached to the Transit Gateway
  • The Transit Gateway route table contains routes for the destination CIDRs
  • Route propagation and associations are configured correctly
  • Other EC2 instances can successfully communicate through the Transit Gateway

If all AWS-side checks are correct and the issue persists, proceed to OS-level validation.


EC2 OS-Level Troubleshooting

Verify SSH Service on the Affected Instance

sudo systemctl status sshd
sudo ss -tlnp | grep :22

Confirm that:

  • The SSH daemon is running
  • SSH is listening on [IP_ADDRESS] or the expected interface

Test Network Connectivity

From another EC2 instance in the same VPC or a Transit Gateway–connected network:

telnet <target-private-ip> 22
ping <target-private-ip>
traceroute <target-private-ip>

If connectivity fails despite correct AWS networking configuration, inspect the EC2 routing table.


Inspect the Linux Routing Table

ip route show

Look for:

  • Unexpected routes to private CIDR ranges
  • Routes pointing to Docker bridge interfaces instead of the primary network interface (eth0 / ens5)

Root Cause: Docker Network CIDR Overlap

By default, Docker creates bridge networks using CIDR ranges such as:

  • 172.17.0.0/16
  • 172.18.0.0/16
  • 172.20.0.0/16

If these CIDR ranges overlap with VPC CIDRs or Transit Gateway–routed networks, Linux routing may prefer Docker bridge routes over the EC2 network interface.

This can cause:

  • Traffic to be routed into Docker bridges
  • Packets never reaching their intended destination
  • Silent failures that appear to be AWS networking issues

How to Confirm Docker Is the Cause

Check for Docker-created routes:

ip route show | grep docker

List Docker networks:

docker network ls
docker network inspect bridge

If Docker network CIDRs overlap with VPC or Transit Gateway routes, this confirms the root cause.


Resolution (High-Level)

At a high level, resolution involves:

  1. Stopping the Docker service
  2. Removing the existing Docker bridge
  3. Reconfiguring Docker to use a non-overlapping CIDR
  4. Restarting Docker
  5. Verifying EC2 network connectivity
  6. Restarting application containers if applicable

Note: Docker is third-party software. AWS Support can assist with identifying AWS-side and OS-level interactions, but detailed Docker configuration should follow Docker documentation and best practices [1].


Prevention and Best Practices

To avoid similar issues in the future:

  • Plan CIDR ranges carefully when running Docker on EC2 in Transit Gateway environments
  • Avoid Docker default CIDRs in enterprise or hybrid networks
  • Configure custom Docker bridge CIDRs that do not overlap with:
    • VPC CIDRs
    • Transit Gateway routes
    • On-premises networks
  • Include ip route show as part of standard EC2 connectivity troubleshooting
  • Treat EC2 OS routing as a critical layer in Transit Gateway architectures

Key Takeaway

When using Amazon EC2 with Transit Gateway and Docker, connectivity issues are not always caused by AWS networking misconfigurations. Docker's default networking behavior can unintentionally override EC2 routing, leading to hard-to-diagnose failures.

By following a structured troubleshooting approach—from AWS infrastructure down to the EC2 operating system—you can quickly isolate the root cause and restore connectivity without unnecessary changes

References