Skip to content

ALB Node SSL Connection Failure - One of Two Nodes Returns SSL_ERROR_SYSCALL in eu-central-1

0

PROBLEM SUMMARY: Application Load Balancer has two nodes across availability zones, but one node consistently fails SSL handshake with SSL_ERROR_SYSCALL while the other works perfectly.

AFFECTED ALB:

  • ALB Name: k8s-lehotels-lehotels-bab2b2f9e3
  • ALB ARN: [will be provided]
  • DNS Name: k8s-lehotels-lehotels-bab2b2f9e3-2015941408.eu-central-1.elb.amazonaws.com
  • Region: eu-central-1

SPECIFIC SYMPTOMS:

  • Working Node IP: 3.120.93.177 (eu-central-1a) - SSL handshake succeeds, returns proper HTTP responses
  • Failing Node IP: 18.193.12.143 (eu-central-1b) - SSL handshake fails with SSL_ERROR_SYSCALL immediately after Client Hello

VERIFICATION COMPLETED: ✅ Target Groups: 2/2 healthy targets across both AZs (172.30.26.176 in 1a, 172.30.22.10 in 1b) ✅ SSL Certificate: Valid ACM certificate (arn:aws:acm:eu-central-1:320902111325:certificate/079c9e1f-d4f0-454f-b46c-8de7b54f4b95) ✅ Security Groups: Same configuration for both AZs ✅ Listeners: HTTPS:443 configured identically ✅ Backend Services: Responding correctly to health checks in both AZs

TROUBLESHOOTING ATTEMPTED:

  • SSL certificate refresh via modify-listener command
  • ALB attribute modifications
  • Verified network connectivity
  • Confirmed issue is not application-side (targets healthy in both AZs)

REPRODUCTION STEPS:

  1. curl -v --resolve le-hotels-k8s.com:443:3.120.93.177 https://le-hotels-k8s.com/ Result: ✅ Works - Full SSL handshake, HTTP/2 302 response

  2. curl -v --resolve le-hotels-k8s.com:443:18.193.12.143 https://le-hotels-k8s.com/ Result: ❌ Fails - SSL_ERROR_SYSCALL after Client Hello

IMPACT:

  • 50% of traffic fails due to random load balancing between nodes
  • Affects production availability
  • Issue persists across multiple days

REQUEST: Please investigate the SSL/TLS configuration on ALB node 18.193.12.143 in eu-central-1b. This appears to be an AWS infrastructure issue where one ALB node has corrupted SSL state while the other functions normally.

PRIORITY: High - Production Impact

asked 10 months ago107 views
1 Answer
0

This issue appears to be related to an SSL/TLS negotiation failure occurring specifically on one of your Application Load Balancer (ALB) nodes. When SSL/TLS negotiation errors occur, they typically don't show up in ALB Access logs because the connection fails before a complete handshake can be established.

To troubleshoot this specific issue where one ALB node is failing with SSL_ERROR_SYSCALL while the other works correctly, I recommend the following approach:

  1. Enable VPC Flow logs on the Elastic Network Interfaces (ENIs) of your ALB nodes. This will allow you to compare traffic patterns between the working node (3.120.93.177) and the failing node (18.193.12.143). The failing node's IP may appear in VPC Flow logs but not in ALB Access logs, confirming an SSL/TLS negotiation failure.

  2. Enable connection logs for your ALB. Connection logs provide more detailed information about the TLS handshake process, including security protocols and cipher suites being used during negotiation.

  3. Capture client-side packet data using tools like tcpdump or Wireshark when connecting to both nodes to determine any differences in the negotiation process. Focus on the "Client Hello" packet and the server's response (or lack thereof).

  4. Check if there are any intermediary network devices or configurations specific to eu-central-1b that might be interfering with the TLS handshake on the failing node.

  5. Compare the TLS security policy and supported protocols/ciphers on both ALB nodes to identify any potential mismatches or configuration issues.

Since this appears to be an infrastructure issue affecting only one node, and you've already verified that targets are healthy in both AZs, security groups are configured correctly, and the SSL certificate is valid, it's likely an issue with the specific ALB node in eu-central-1b. If the above troubleshooting steps don't resolve the issue, you should contact AWS Support with the detailed information you've gathered, as this may require intervention at the AWS infrastructure level.
Sources
Why do SSL/TLS negotiation errors occur when connecting to an Application Load Balancer over HTTPS, and how can I identify the responsible client IP? | AWS re:Post
Troubleshooting ClientTLSNegotiationErrorCount in AWS Network Load Balancer | AWS re:Post

answered 10 months ago
  • I have confirmed thtat the issue is ssl related, because if i try to do a curl -v of the node that is not working and i do a http curl it works , but if i do a https curl it does not work, how can i resolve this issue ?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.