Site-to-Site VPN Convergence

0

Hello,

We currently use a redundant site-to-site VPN with BGP between between AWS and our on-prem site. Each site-to-site VPN from AWS side connects to a single WAN endpoint on our side. We have configured BGP with LP and MED to make sure the flow is symmetric with the preference as shown in the image below. BGP is configured with a keep-alive of 3 and hold-down timer of 9 seconds while DPD is configured with 10 seconds interval and retry count of 3.

While doing failover testing I observed that whenever WAN1 goes down, it takes approximately 20-35 seconds for the AWS side to converge. I can see our side firewall using the new tunnel and BGP route as soon as 10 seconds. Further decreasing the BGP timers and the IPSEC DPD timers don't bring much change to this time. I have a trace/mtr running from inside the VPC and I can see that AWS side keeps using primary tunnel even when the tunnel is down on AWS side. The TGW route-table does update prompty with the new tunnel attachment but the traceroute keeps using the old down tunnel. I have tried MTR with different options like TCP/ICMP/UDP to make sure it's not some weird ICMP hashing issue.

Enter image description here

4 Answers
0

Check if you have configured Bidirectional Forwarding Detection (BFD). BFD is a simple hello mechanism that detects failures in a network.

Sachin
answered a year ago
0

BFD is not configured between the peers and I am not sure its even supported on the AWS side while configuring over IPSEC.

Regardless we don't seem to converge within 10seconds so i doubt BFD configuration if possible would make any difference.

answered a year ago
0

BFD needs to configure on Direct connect. you need to check with your vendor.

check below link for example.

https://aws.amazon.com/premiumsupport/knowledge-center/enable-bfd-direct-connect/

Sachin
answered a year ago
0

As I have mentioned in my question, we are using the IPSEC tunnels and not direct connect so not sure how I can use BFD in my scenario. I know BFD is supported on direct connect but the problem is related to IPSEC tunnel convergence.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions