Skip to content

How do I troubleshoot layer 3 issues in Direct Connect?

6 minute read
0

I want to troubleshoot my AWS Direct Connect connection when it goes down because of layer 3 issues.

Resolution

Note: For layer 1 issues, see How do I troubleshoot layer 1 issues in Direct Connect? For layer 2 issues, see How do I troubleshoot layer 2 issues in Direct Connect?

Check the AWS Health Dashboard for maintenance activity

Direct Connect connections can go DOWN because of ongoing AWS maintenance that can last from a few minutes to a few hours. During maintenance, Border Gateway Protocol (BGP) connections move to an idle state.

Check the Events section of the AWS Health Dashboard for ongoing or recently completed AWS maintenance that might affect your Direct Connect connection.

Set up notifications for critical metrics so that you receive immediate alerts.

Check your BGP configuration

If your virtual interface state is available but you can't establish a BGP connection, then take the following actions:

  • Check for configuration issues with your local BGP Autonomous System Number (ASN) and the AWS ASN.
  • Check for configuration issues with your peer IP addresses on both sides of the BGP peering session.
  • Configure your MD5 authentication key so that it matches the key in the downloaded router configuration file. Make sure that your key contains no extra spaces or characters.
  • Don't advertise more than 100 prefixes for private virtual interfaces or 1,000 prefixes for public virtual interfaces.
    Note: You can't modify or exceed these quotas.
  • Deactivate firewall or network access control list (network ACL) rules that block TCP port 179 or high-numbered ephemeral TCP ports.
    Note: These ports are necessary for BGP to establish a TCP connection between the peers.
  • Check your BGP logs for errors or warning messages.

If the preceding steps don't resolve your BGP issues, then take the following actions:

  • Check that you correctly configured BGP on your gateway.
  • Do a ping test between the BGP peer IP addresses.
  • Collect the packet captures for traffic between the BGP peer IP addresses from your gateway device.

Troubleshoot BGP states

If you correctly configured your BGP and still experience layer 3 issues, then troubleshoot the BGP states.

Note: The Idle state is the first state where BGP waits for a start event. The start event occurs when you configure a new BGP neighbor or reset an established BGP peering session. BGP initializes resources, resets a ConnectRetry timer, and then initiates a TCP connection to the remote BGP neighbor.

Connect state

During the Connect state, BGP waits for the TCP three-way handshake to complete. If the handshake is successful, then the connection moves to the OpenSent state.

Example of a successful connection:

2021-07-04 22:50:20 169.254.60.146 169.254.60.145 TCP 74 34516 → 179 [SYN] Seq=0 Win=2920 Len=0 MSS=1460 SACK_PERM TSval=3030456 TSecr=0 WS=1
2021-07-04 22:50:20.719228 169.254.60.145 169.254.60.146 TCP 74    179 → 34516 [SYN, ACK] Seq=0 Ack=1 Win=26844 Len=0 MSS=1375 TSval=64921081 TSecr=3030456 WS=128
2021-07-04 22:50:20.719453 169.254.60.146 169.254.60.145 TCP 66    34516 → 179 [ACK] Seq=1 Ack=1 Win=2920 Len=0 TSval=3030476 TSecr=64921081

If the connection or ConnectRetry fails, then the connection remains in the Active state and doesn't go to the OpenSent state.

To troubleshoot Connect state issues, take the following actions:

  • To confirm there's TCP connectivity between the two BGP neighbors, run a telnet test on TCP port 179. If there's no TCP connectivity, then check the logs for errors or dropped packets during the TCP connection.
  • Verify that you correctly configured the BGP neighbor's IP address on BGP, your gateway, and AWS.
  • Verify that you entered the correct BGP authentication on the routers.

OpenSent state

After BGP sends an OPEN message to the peer, BGP waits in the OpenSent state for the OPEN reply. If BGP successfully receives a reply, then the BGP state moves to OpenConfirm and sends a keepalive message to the peer. When a connection fails, BGP returns to the Idle or Active state.

If BGP doesn't establish a connection, then check your logs for an OPEN message that the BGP neighbor sent and received that includes BGP parameters.

Example OPEN message:

Border Gateway Protocol - OPEN MessageType: OPEN Message (1)  
Version: 4   
My AS: 65000  
Hold Time: 30  
BGP Identifier: 54.241.242.80

Also, check the OpenSent logs to identify the cause of failure.

Note: AWS doesn't accept 0 as a Hold Time value.

OpenConfirm state

BGP waits in the OpenConfirm state for a keepalive message from the peer. If BGP successfully receives a message, then the state moves to Established. Otherwise, the state returns to Idle or Active state.

In your logs, verify that the peer sent the keepalive message and BGP received it.

Example keepalive messages between BGP peers:

65    2021-07-04 22:50:20.744297    169.254.60.146    169.254.60.145    BGP    85    KEEPALIVE Message
66    2021-07-04 22:50:20.765323    169.254.60.145    169.254.60.146    BGP    85    KEEPALIVE Message

Established state

In the Established state, BGP exchanges information between the peers.

Example BGP update message:

Border Gateway Protocol - UPDATE Message
Path attributes
Path Attribute - AS_PATH: 65000 
Path Attribute - NEXT_HOP: 169.254.60.146 
Network Layer Reachability Information (NLRI)
   192.168.0.0/16 

If BGP doesn't establish a connection, then take the following actions:

  • Check the logs to make sure that the routers correctly exchange updates. Verify that the advertised prefixes match the expected routes.
  • Make sure that BGP filters or prefix lists don't prevent route propagation within the route table.
  • Confirm that the advertised route entries on the peer route tables are correct.
  • Your BGP logs or on-premises devices might show that the BGP peering session changed from the Established to Idle state on your virtual gateway's Direct Connect virtual interface. In this case, make sure that the peer advertises fewer than 100 routes over the BGP peering session.

Troubleshoot BFD issues

AWS automatically turns on asynchronous Bidirectional Forwarding Detection (BFD) for Direct Connect virtual interfaces on AWS.

To troubleshoot BFD issues, take the following actions:

  • If you activated BFD on your router, then check that you correctly configured the BFD.
  • Make sure the BFD peering session is in the UP status on your router.
  • Review the BFD events or logs on your router.

Note: The default BFD liveness detection minimum interval for AWS is 300 milliseconds (ms). The default BFD liveness detection multiplier is 3.

To avoid failover or connection issues, it's a best practice not to configure graceful restart and BFD at the same time. For fast failover, configure BFD with graceful restart turned off.

Related information

Troubleshoot layer 3/4 (Network/Transport) issues

AWS OFFICIALUpdated a month ago