Site-to-Site VPN Connection unstable aws to azure

0

Hi Everyone,

We currently have multiple Site-to-Site VPN Connections between Azure and AWS, (multiple accounts).
One tunnel, on One VPN connection constantly "flaps" due to aws failing to respond via DPD in time. (reported via Azure support)

The only difference between this VPN Connection and the others happens to be that the 'Local IPv4 Network Cidr' and 'Remote IPv4 Network Cidr' are set to 0.0.0.0/0. On all our other VPN Tunnels this setting is blank.
I am unable to remove this setting. When trying to remove the 0.0.0.0/0 Local & Remote Network Cidr, it stays in modifying and then eventually goes back tonan available state.

I am unsure if that could actually cause an issue but thought it would be a mention.

I do not see why increasing DPD would solve my issue. When one ipsec tunnel is stable, and one is not on the same VPN connection?

Does anyone have any ideas?

질문됨 3년 전1126회 조회
2개 답변
0

By default AWS has DPD at 30 seconds. Where as Azure has it at 45 seconds. Increasing both to 120 seconds has produced a stable tunnel in the end. Currently 18+ hours stable at least. Which is better than the previous 2hours.

It would be interesting if someone has an idea why the initial configuration works on 3 of our other tunnels, but this tunnel was the only one that constantly failed every 2 hours due to aws not responding via DPD... (based on what my Azure support says)

답변함 3년 전
0

Hello Tim,

DPD is generally the symptom of a problem and the fact that there was no DPD response, combined with the fact that it only happens for certain tunnels, seems to suggest there is potentially an underlying problem with network connectivity. Considering changing the timeout to 120 seconds seems to have fixed it, most likely means the blip likely lasts between 30 and 120 seconds. Its worth noting that network blips may not impact certain applications that have built in resiliency mechanisms and have the ability to re-establish connectivity and continue with packet exchange seamlessly, which may very well be the case here.
Further, if DPD timeout is set to 120 seconds on the AWS end, it means that the DPD "R_U_THERE" messages are sent every 10 seconds and will timeout only if 12 consecutive messages have not been responded to. This would mean that if you had an underlying network problem for 110 seconds, the tunnel will still remain online since the 12th DPD message was responded to and the timer will reset. This could be problematic if you have network sensitive applications but may not be a problem if the application is able to recover/re-establish as explained earlier. My recommendation:
if an application using this path is seeing problems, please get in touch with AWS Support via the Support portal from the account that the VPN lives in and mention:
a) The corresponding VPN ID(s) and region
b) Timestamps (with timezone) from when the problem was seen the last couple times and
c) Excerpts of the Azure logs that can be used to compare with that of our own logs

I'm confident we should be able to get to the bottom of this once we look at our logs.

NOTE: Please refrain from divulging any personal information around your AWS resources including Resource IDs, Public IPs and Security group rules to name a few since all posts are publicly available indefinitely. If you need pointed guidance, please reach out to us at AWS Support via the Support console.

답변함 3년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인