- 新しい順
- 投票が多い順
- コメントが多い順
Hi mdanieli20,
Please try this solution it will be helpful for resolve.
To Your Network Load Balancer (NLB) not switching between targets during AWS VPN tunnel maintenance, you can implement a solution by customizing the health check mechanism of the NLB. Start by creating a custom health check endpoint on your remote service that can accurately indicate the status of the VPN tunnel. This endpoint should respond with health information based on the VPN's operational state, allowing the NLB to detect when the tunnel is under maintenance. configure the NLB health checks to use this custom endpoint, ensuring that the health check settings (protocol, path, port, ETC) align with your endpoint’s configuration. Additionally, set up AWS CloudWatch alarms to monitor VPN tunnel maintenance events and trigger an AWS Lambda function to update the health status of the NLB targets accordingly. This Lambda function should deregister the affected targets from the NLB during maintenance periods. you can improve the NLB's ability to detect and respond to VPN tunnel issues, ensuring better connectivity and reliability for your microservices.
If you want more information, please go through the AWS Document.
https://aws.amazon.com/about-aws/whats-new/2018/09/network-load-balancer-now-supports-aws-vpn/
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html
Hi! Could you explain a bit more what is the target type you're using on NLB? I would assume you use some active IP address, correct? Who owns (device/system/endpoint) that IP address which you register as a target?
Also, have you tracked what happens during AWS maintenance - I would assume that IP stays active/reachable and we keep sending traffic to it - but I would like first to answer question above , so we could start solutioning.
Analysis
NLB Health Checks:
**NLB Health Check Configuration: ** NLB relies on health checks to determine the status of its targets. If the health checks do not detect that a tunnel is down during maintenance (because the tunnel might be up but not forwarding traffic), the NLB will continue to route traffic to that tunnel.
Tunnel Maintenance: During AWS VPN tunnel maintenance, the tunnel might remain in a state where it technically isn't down (so the health check sees it as "healthy"), but it's not forwarding traffic properly, causing the connection issues you're seeing.
Health Check Sensitivity:
Health Check Port and Protocol: Ensure that the health check is configured on a port and protocol that will accurately reflect the availability of the VPN tunnel. You might want to use a more sensitive protocol like TCP if you're currently using HTTP/HTTPS, or vice versa depending on what better reflects tunnel health.
Health Check Interval and Unhealthy Threshold: Adjusting these settings might help the NLB detect issues faster, though this also depends on the maintenance behavior. A more aggressive health check could lead to quicker failover.
Static Routes and NLB Behavior:
**Static Route Behavior: **Since you're using static routes without BGP, the NLB doesn't have dynamic feedback on the path availability. When AWS performs maintenance, the NLB might not have the immediate feedback needed to failover, unlike in a generic failure scenario where the tunnel would go completely down. Potential Solutions
Enhanced Health Check Configuration:
Use a Custom Health Check Endpoint: Consider setting up a custom health check endpoint on the remote service that actively tests the ability to reach the service through the tunnel, rather than just checking tunnel availability.
Increase Health Check Frequency: Increase the frequency of health checks and lower the unhealthy threshold to detect issues more quickly. Fallback Mechanism in Microservices:
Application-Level Failover: Implement application-level logic in your microservices to detect and handle cases where the NLB is not switching as expected. This could include retries with backoff or even manual re-routing logic.
Alternative VPN Setup:
**Redundant VPN Setup: **Consider setting up redundant VPN tunnels and configuring your NLB to target both, rather than relying on static routes. This can be more robust but requires your customer to support such a configuration.
Multi-AZ Setup: Ensure your VPN configuration is spread across multiple Availability Zones to mitigate the impact of maintenance on any single zone.
Monitor and Automate Failover:
Monitoring and Alerts: Set up monitoring on the VPN tunnels and the NLB target groups, and configure automated scripts to intervene if a tunnel is in maintenance but not properly failing over.
Custom Routing Logic: If feasible, implement a custom routing mechanism or an additional layer of load balancing that can detect and handle these cases more effectively.
Next Steps Review and adjust NLB health checks to be more aggressive and sensitive to the actual availability of the service.
Test the application behavior during planned maintenance to simulate tunnel maintenance and ensure failover is working as expected.
Explore application-level failover mechanisms if adjusting the NLB configuration does not resolve the issue.
関連するコンテンツ
- AWS公式更新しました 1年前
- AWS公式更新しました 1年前
Hi, the target type is IP address. These IP addresses are services exposed from a provider's network that can be accessed via a site-to-site VPN.
{ "TargetGroups": [ { "TargetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:xxxxxxxxxxx:targetgroup/xxxxxxxxx-target-group/xxxxxxxxx", "TargetGroupName": "xxxxxxxxxx-target-group", "Protocol": "TCP", "Port": 13010, "VpcId": "vpc-xxxxxxxxxxxxx", "HealthCheckProtocol": "TCP", "HealthCheckPort": "13010", "HealthCheckEnabled": true, "HealthCheckIntervalSeconds": 5, "HealthCheckTimeoutSeconds": 2, "HealthyThresholdCount": 2, "UnhealthyThresholdCount": 2, "LoadBalancerArns": [ "arn:aws:elasticloadbalancing:us-east-1:xxxxxxxxx:loadbalancer/net/xxxxxxxxxxx/xxxxxxxxxxx" ], "TargetType": "ip", "IpAddressType": "ipv4" } ] }
No, during AWS maintenance, that IP is unreachable because VPN Tunnel is down. (Remember the provider's network don't support BGP).
These are the attributes configured in the target group:
ATTRIBUTES proxy_protocol_v2.enabled false ATTRIBUTES target_group_health.unhealthy_state_routing.minimum_healthy_targets.count 1 ATTRIBUTES preserve_client_ip.enabled false ATTRIBUTES stickiness.enabled false ATTRIBUTES target_group_health.unhealthy_state_routing.minimum_healthy_targets.percentage off ATTRIBUTES deregistration_delay.timeout_seconds 300 ATTRIBUTES target_group_health.dns_failover.minimum_healthy_targets.count 1 ATTRIBUTES stickiness.type source_ip ATTRIBUTES target_health_state.unhealthy.connection_termination.enabled true ATTRIBUTES deregistration_delay.connection_termination.enabled true ATTRIBUTES target_health_state.unhealthy.draining_interval_seconds 0 ATTRIBUTES load_balancing.cross_zone.enabled false ATTRIBUTES target_group_health.dns_failover.minimum_healthy_targets.percentage off
HI! So from this config I can see/deduct the following: