How do I troubleshoot BGP connection failure over AWS VPN or Direct Connect?

5 minute read
0

I want to debug BGP sessions that fail to establish a connection because of parameter mismatch over VPN or over AWS Direct Connect.

Short description

When configuring dynamic AWS Site-to-Site VPN and AWS Direct Connect connections, you might encounter connectivity issues related to the Border Gateway Protocol (BGP). To address this, you must first analyze the BGP logs and identify the specific point of failure within the BGP negotiation process.

Note: For troubleshooting a BGP session that can't establish a connection or is in an Idle state over a VPN tunnel, see How do I troubleshoot BGP connection issues over VPN?

For troubleshooting a BGP session that can't establish a connection or is in an Idle state over Direct Connect, see How can I troubleshoot BGP connection issues over Direct Connect?

Resolution

Make sure you have a verified BGP configuration on the customer gateway and the pings between the BGP peer IPs are working. Then, collect the packet captures for traffic between the BGP peer IPs from the customer gateway device. Finally, analyze the data as follows for each BGP state.

BGP states

Idle

This is the first state where BGP waits for a “start event.” The start event occurs when you configure a new BGP neighbor or when you reset an established BGP peering. Next, BGP initializes some resources, resets a ConnectRetry timer and initiates a TCP connection to the remote BGP neighbor.

Connect

During this stage, BGP waits for the TCP three-way handshake to complete. If this stage completes successfully, the connection moves to the OpenSent state.

2021-07-04 22:50:20 169.254.60.146 169.254.60.145 TCP 74 34516 → 179 [SYN] Seq=0 Win=2920 Len=0 MSS=1460 SACK_PERM TSval=3030456 TSecr=0 WS=1
2021-07-04 22:50:20.719228 169.254.60.145 169.254.60.146 TCP 74	179 → 34516 [SYN, ACK] Seq=0 Ack=1 Win=26844 Len=0 MSS=1375 TSval=64921081 TSecr=3030456 WS=128
2021-07-04 22:50:20.719453 169.254.60.146 169.254.60.145 TCP 66	34516 → 179 [ACK] Seq=1 Ack=1 Win=2920 Len=0 TSval=3030476 TSecr=64921081

If the connection or ConnectRetry fails, it remains in the Active state and doesn't go to OpenSent state.

Check the Connect logs to identify the cause of failure:

  • In the case of dynamic VPN, ensure that there is TCP connectivity between the two BGP neighbors by performing a "telnet" test on TCP port 179
  • If there is no TCP connectivity, check the logs to see if there were any errors or dropped packets during the TCP connection.
  • Check that the neighbor's IP address is correctly configured on both BGP, customer gateway, and AWS.
  • If using the Direct Connect's virtual interface, check if you've entered the correct BGP authentication (MD5 password) on the routers.

OpenSent

After sending an OPEN message to the peer, BGP waits in this state for the OPEN reply. If it receives a successful reply, the BGP state moves to OpenConfirm and sends a KEEPALIVE to the peer. A failed connection puts BGP back into the Idle or Active state.

Border Gateway Protocol - OPEN Message
Type: OPEN Message (1)  
Version: 4   
My AS: 65000  
Hold Time: 30  
BGP Identifier: 54.241.242.80

In the OpenSent message, the peer sends its BGP parameters, such as a version number, AS number, Hold Timer (default value: 30 seconds for BGP over VPN and 90 seconds for BGP over Direct Connect) and BGP Identifier IP address. If BGP fails to establish, check the logs to ensure that the OPEN message was sent and received correctly by the neighbor with the BGP parameters. Also, check the OpenSent logs to identify the cause of failure.

Note: AWS doesn't accept 0 as a Hold Time value.

OpenConfirm

The BGP state is one step away from reaching its final state (Established). BGP waits in this state for KEEPALIVEs from the peer. If successful, the state moves to Established. Otherwise, the state moves back to Idle or Active state, based on the errors.

65	2021-07-04 22:50:20.744297	169.254.60.146	169.254.60.145	BGP	85	KEEPALIVE Message
66	2021-07-04 22:50:20.765323	169.254.60.145	169.254.60.146	BGP	85	KEEPALIVE Message

Check the logs to ensure that the KEEPALIVE message was sent and received correctly.

Established

In this state, BGP exchanges information between the peers. The information consists of updates (route advertising ), KEEPALIVES, or notification.

Border Gateway Protocol - UPDATE Message
Path attributes
Path Attribute - AS_PATH: 65000 
Path Attribute - NEXT_HOP: 169.254.60.146 
Network Layer Reachability Information (NLRI)
   192.168.0.0/16 

If a connection isn't established, do the following:

  • Check the logs to ensure that the routers are exchanging updates correctly. Verify that the advertised prefixes match the expected routes.
  • Check if any BGP filters or prefix lists are preventing the propagation of routes within the route table.
  • Check the advertised route entries on peer route tables.
  • You might see the BGP session going from an Established to an Idle state for VPN or Direct Connect Virtual Interface on a virtual gateway. Verify that the peer is advertising less than 100 routes over the BGP session.
AWS OFFICIAL
AWS OFFICIALUpdated a year ago