ELB unhealthy state

0

Hello one and all

On Friday and again on Monday, we had an outage where 3 EC2 nodes (running AWS Linux2) became unhealty on the ELB.

The only way I could resolve was to reboot the 3 systems.

2 systems on ELB A 1 system on ELB B

One is a web server the other 2 are backend services.

I cannot for the life of me find anyting in the ELB logs, application logs.

Has anyone experienced this before?

No configuration has changed

2 Answers
0

Hi, Here’s how you can troubleshoot this issue:

Quick Checks:

  1. Health Check Configuration:

    • Verify the health check settings on your ELB. Ensure that the health check is configured correctly for your web server and backend services. Sometimes, a slight misconfiguration can cause instances to be marked as unhealthy.
  2. Check Instance Logs:

    • Review the application and system logs on your EC2 instances, especially around the time when they go unhealthy. Look for any signs of resource exhaustion, network issues, or application errors.
  3. Monitor Instance Performance:

    • Use CloudWatch to monitor CPU, memory, and network metrics. Spikes or unusual patterns could indicate the root cause of the issue.
  4. Network and Security Groups:

    • Ensure that the security groups and network ACLs are not causing connectivity issues between the ELB and the instances. Check for any recent changes in your VPC settings.
  5. Auto-Recovery:

    • Consider enabling EC2 auto-recovery for critical instances. This can automatically reboot an instance if it fails a system status check, which might reduce the downtime you're experiencing.
  6. ELB Logs:

    • Double-check the ELB access logs for any anomalies or repeated patterns that might provide insight into why the instances are being marked unhealthy.

I hope this helps! 😁

If someone has experienced something similar, all the support help us!

Daher
answered 19 days ago
0

Hello, It seems you firstly need to check healthy check reason codes [1] during outgage. As you explain one is web server and the other two are backend services, I guess all nodes will get unhealthy once one of them fails to respond health check requests.

I also suggest you to take a look at instance performance using CloudWatch, monitoring CPU, Memory and network metrics.

Kindly share two loadbalancers' health check configuration and reason codes here. It would help a lot to resolve this issue

References [1] https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html

AWS
answered 19 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions