- Newest
- Most votes
- Most comments
Greeting
Hi Tye,
Thanks for sharing this detailed explanation of your experience! It’s clear you’ve been thorough in identifying where AWS metrics seem to misalign, and I can understand how this would create challenges in diagnosing backend issues and keeping your dashboards actionable. Let’s break this down and find a way to clarify and improve your monitoring setup. 😊
Clarifying the Issue
You’ve observed that HTTPCode_ELB_5XX_Count includes 5XX status codes returned by Lambda functions when invoked through an ALB. This creates ambiguity in your dashboards, as these successful Lambda responses appear as if the ALB itself is experiencing backend availability problems. Even though the Lambda successfully executes and returns a 5XX response, it’s still lumped into the ELB metric, making it difficult to trace and diagnose issues effectively.
AWS’s design logic behind this behavior is that the ALB essentially acts as a proxy. Any response returned to the ALB (whether from a backend or directly generated by the ALB itself) contributes to HTTPCode_ELB_5XX_Count. Unfortunately, this doesn’t distinguish between backend-originated errors and ALB-specific issues, creating confusion for users.
Your goal is to have these successful 5XX Lambda responses logged under HTTPCode_Target_5XX_Count, which would provide greater clarity and allow you to build more specific dashboards to monitor and address Lambda-specific errors. This is an excellent goal, as it aligns with the principle of actionable monitoring and reduces noise in your metrics!
Key Terms
- ALB (Application Load Balancer): AWS's load balancer that distributes traffic across targets such as Lambda functions, EC2 instances, and containers.
- HTTPCode_ELB_5XX_Count: A CloudWatch metric recording 5XX errors generated by the ALB itself or forwarded from backend responses.
- HTTPCode_Target_5XX_Count: A CloudWatch metric for 5XX errors generated by backend targets, such as Lambda functions or EC2 instances, allowing isolation of target-specific issues.
- TargetGroup Dimension: A filter for metrics to isolate traffic directed to a specific target group behind an ALB.
The Solution (Our Recipe)
Steps at a Glance:
- Configure custom logging for your Lambda function to explicitly record 5XX responses.
- Use structured logging to push these events to CloudWatch Logs.
- Create a custom metric filter in CloudWatch to capture Lambda 5XX responses.
- Add the custom metric to your Lambda-specific dashboards for monitoring.
- Monitor and manage costs associated with custom metrics and X-Ray.
Step-by-Step Guide:
-
Configure Custom Logging for Your Lambda Function
Update your Lambda function code to explicitly log a structured message whenever a 5XX response is returned. This will help you trace and attribute these responses accurately.import json def lambda_handler(event, context): try: # Example business logic raise Exception("Simulated backend error") except Exception as e: # Log and return a 5XX status log_message = { "statusCode": 502, "error": str(e), "message": "Lambda caught an exception and returned 5XX" } print(json.dumps(log_message)) # Log structured data for CloudWatch Metric Filters return { "statusCode": 502, "body": json.dumps({"error": "Backend error"}) }
- Use Structured Logging to Push Events to CloudWatch Logs
Ensure that the structured logs are being sent to CloudWatch. If you are using a default Lambda setup, these logs will already appear in your Lambda's CloudWatch Logs group.
-
Create a Custom Metric Filter in CloudWatch
In the AWS Management Console:- Navigate to CloudWatch > Log Groups and select your Lambda’s log group.
- Create a metric filter with a pattern that matches 5XX responses. Example:
{ $.statusCode = 502 } - Assign this filter to a custom metric, such as
AppName_TargetGroupName_5XX_Count.
Pro Tip: Use consistent naming conventions like
AppName_TargetGroupName_5XX_Countto maintain clarity across multiple applications or target groups.
- Add the Custom Metric to Your Dashboards
- Navigate to CloudWatch > Dashboards and add the custom metric.
- Filter the metric by Lambda function name or other dimensions to isolate the data for your analysis.
- Monitor and Manage Costs
- Custom Metrics: Be mindful that custom metrics incur additional charges. For high-throughput applications, consider using filters to track only critical errors or aggregate metrics to reduce costs.
- AWS X-Ray: If using X-Ray, costs can grow with heavy traffic or complex tracing setups. Review the X-Ray pricing guide and fine-tune sampling rates to balance costs with visibility.
Closing Thoughts
By implementing custom metrics for Lambda 5XX responses, you gain greater clarity into backend issues without relying solely on ALB metrics like HTTPCode_ELB_5XX_Count. This approach complements AWS’s existing monitoring tools and helps you build actionable dashboards tailored to your architecture.
For additional guidance:
- AWS Lambda Monitoring
- Using CloudWatch Metric Filters
- CloudWatch Metrics for ALB
- AWS X-Ray Tracing for Lambda
- CloudWatch Pricing Details
Farewell
I hope this clears up the ambiguity in your metrics, Tye! I can see how this is particularly important with your dual ALB setup for Lambda and EKS. Let me know if you have any other questions or need help setting up these custom metrics. Wishing you success with your monitoring dashboards! 🚀😊
Cheers,
Aaron! 😊
Relevant content
- asked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated a year ago

Thank you very much for the detailed response.
One comment. You wrote:
| Any response returned to the ALB (whether from a backend or directly generated by the ALB itself) contributes to HTTPCode_ELB_5XX_Count.
This is true for Lambda backends but not for others. It would be good for Lambda backends to use HTTPCode_Target_5XX_Count as the others do. This consistency would alleviate AWS users from having to do elaborate work-arounds. So I still see the odd behavior of only Lambdas here to be hard to see as not just a bug.
My work-around was not quite as elaborate as yours. I summed the AWS/ApplicationELB LambdaUserErrors over all TargetGroups and subtracted that from the HTTPCode_ELB_5XX_Count metric in my dashboards and alarms. For dashboards, you can use SUM(SEARCH('{AWS/ApplicationELB,LoadBalancer,TargetGroup} MetricName="LambdaUserError" LoadBalancer="$lb"', 'Sum')). For alarms, you have to use: SELECT SUM(LambdaUserError) FROM "AWS/Lambda" WHERE LoadBalancer = '$lb' GROUP BY LoadBalancer (which means the graph of the alarm's history can be inaccurate as recently as 30m ago).
Granted, wanting to get the other HTTPCode_Target_[234]XX_Count metrics for your Lambdas may lead to an approach similar to what you outlined anyway. We will likely get those using Prometheus instead as the cost management is unlikely to become an issue that way.
So it would also be great for AWS customers if the ALB recorded 2XX, 3XX, and 4XX from Lambdas in these metrics like other backends.
Hi Tye,
Thanks for the follow-up! You’re absolutely right—recording Lambda 5XX responses under
HTTPCode_Target_5XX_Count, as with other backends, would create much-needed consistency and simplify monitoring. It’s hard not to see this as a bug or at least a design quirk.Your workaround using
LambdaUserErrormetrics is both clever and practical—thanks for sharing the exact syntax for dashboards and alarms! Prometheus also sounds like a strong option for detailed metrics without significant cost concerns.Extending ALB metrics to include
HTTPCode_Target_2XX_Count,3XX, and4XXfor Lambdas would align things further and reduce the need for such workarounds. Raising this through AWS feedback channels might encourage action if enough users highlight the issue.Let me know if you need help with Prometheus or refining your setup! 😊
Best regards,
Aaron 🚀