ALB logs show target_status_code=502 and elb_status_code=202

1

I have an Application Load Balancer configured with a ECS Fargate target group, with ALB access logs turned on. Occasionally, I see requests like this in the ALB access logs (queried through Athena):

#	type	time	elb	client_ip	client_port	target_ip	target_port	request_processing_time	target_processing_time	response_processing_time	elb_status_code	target_status_code	received_bytes	sent_bytes	request_verb	request_url	request_proto	user_agent	ssl_cipher	ssl_protocol	target_group_arn	trace_id	domain_name	chosen_cert_arn	matched_rule_priority	request_creation_time	actions_executed	redirect_url	lambda_error_reason	target_port_list	target_status_code_list	classification	classification_reason	day
1	h2	2024-04-02T18:27:26.661881Z	app/prod-alb-api/ebd9f80c43eb056e	[REDACTED-IP]	3500	[REDACTED-TARGET-IP]	8071	0.001	0.004	0.0	202	502	26160	62364	POST	[REDACTED-URL]	HTTP/2.0	[REDACTED-USER_AGENT]	ECDHE-RSA-AES128-GCM-SHA256	TLSv1.2	[REDACTED-ARN]	Root=1-660c4e0e-0caa78f12803086b243258d9	[REDACTED-DOMAIN]	arn:aws:acm:eu-west-1:475615009988:certificate/bd66730b-26a9-4e13-b47a-a50041bb0614	10000	2024-04-02T18:27:26.638000Z	waf,forward	-	-	[REDACTED-IP]:8071	502	-	-	2024/04/02

The unusual thing about this log is that the 'elb_status_code' is 202, but the 'target_status_code' is 502. My internal application logs show that my service is returning an HTTP 202 in its response to the incoming load balancer request.

Even if my service was actually returning a 502, my understanding is that the ALB should never be converting that into a 202. What seems to be happening is that the ALB is incorrectly swapping the target_status_code and elb_status_code (and is actually returning a 502 for some reason in response to a 202 from the target group).

This is also incrementing the CloudWatch HTTPCode_Target_5XX_Count metric, which is causing spurious alarm activation.

The above log entry is representative of all of the log entries I'm seeing with this target_status_code/elb_status_code combination. All of them come from the same external client IP and user agent, which makes me suspect that this client is confusing the ALB with an unusual request. Unfortunately, I'm unable to reproduce this myself, or contact the client causing this.

1 Answer
0

To further troubleshoot this I suggest you'll setup a traffic mirroring session where your ALB ENIs (elastic network interfaces) are your mirror source and your mirror target is some EC2 instance in which you'll run tcpdump to capture the traffic to pcap files (make sure that EC2 instance has a security group with inbound rule that allowes VXLAN (UDP 4789) traffic).

Then, search the pcap files using Wireshark according to the timestamp of the problematic entry in your ALB access logs to investigate exactly how the request and response look like before and after the ALB. You can also try to replay the same problematic client request (using curl for example) to see if it triggers that exact same problem.

profile pictureAWS
EXPERT
answered 5 months ago
profile picture
EXPERT
reviewed 5 months ago
  • I just noticed from the access log you shared that the traffic is encrypted, so you wont be able to see the payload of the traffic between the client and the ALB, however you will still be able to see the traffic between the ALB and your backend servers (assuming this is not encrypted as well) which will provide you with some insights to what is happening.

  • If I understand correctly, this would require mirroring all of our production traffic to a single EC2 instance, and saving it all to disk with tcpdump (since the source ip address will always be the ALB ip address). Are there any less resource-intensive approaches to debugging this?

  • You can filter the capture based on the X-Fowarded-For header value which will contain the client IP. However you will only see the request and not the response.

    tshark -i <interface_name> -Y "http.request and http.header.x_forwarded_for == <CLIENT_IP>" -w output_file.pcap
    

    If you can correlate that with the access logs you can then try to replay the same request and hopefully recreate the same issue to further troubleshoot.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions