How do I troubleshoot missing X-Ray traces, segments, or services in the service map?

8 minute read
0

I want to troubleshoot my missing AWS X-Ray traces, segments, and services in the service map.

Short description

Missing traces cause missing services in the service map. The following are causes that lead to missing traces:

  • Incorrect instrumentation
  • X-Ray SDK can't reach the X-Ray daemon over the daemon address or port number
  • X-Ray daemon can't reach the X-Ray service endpoint
  • Tracing isn't configured at the individual service level
  • Missing AWS Identity and Access Management (IAM) permissions
  • Sampling rules configuration
  • Missing configuration in the OpenTelemetry Collector

The absence of open parent segments to the sub segments can cause missing segments. For more information, see Troubleshooting AWS X-Ray.

Resolution

Incorrect instrumentation

Incorrect instrumentation when you use X-Ray SDK

If your application code isn't correctly instrumented for X-Ray SDK to patch the supported libraries and frameworks, then you can have missing traces. Trace IDs that aren't passed to the downstream services can also cause missing traces.

To troubleshoot this issue, turn on debug-level logging on the SDK to output more detailed logs to your application log file. This allows you to further isolate issues that are related to the instrumentation and trace the flow of the trace ID across the application. The logs also indicate whether a trace is missing because the trace isn't sampled or because of instrumentation misconfiguration. For more information, check the debug logging in X-Ray SDK for Java, Node.js, Python, .NET, or Ruby.

Note: If the application uses an auto-instrumentation agent for Java, then it doesn't capture traces for asynchronous requests, and misses traces. If you have missing traces, then use manual instrumentation. If the X-Ray SDK is missing some features or has issues, then check the OpenTelemetry SDK for the auto-instrumentation agent for Java.

Incorrect instrumentation when you use OpenTelemetry SDK

If you use OpenTelemetry SDK and the application code isn't correctly instrumented to patch the supported libraries and frameworks, then the code can have missing traces. Trace IDs that aren't passed to the downstream services can also cause missing segments.

To troubleshoot this issue, take the following actions:

  • Turn on debug-level logging on the SDK to send more detailed logs to your application log file. This allows you to further isolate issues related to the instrumentation and trace the flow of the trace ID across the application. Logs indicate whether a trace is missing because the trace isn't sampled or because of instrumentation misconfiguration. For more information, check the debug logging in X-Ray SDK for Java, Node.js, Python, .NET, or Ruby.
  • Make sure that your instrumentation isn't turned off. For more information, see Suppressing specific auto-instrumentation on the OpenTelemetry website.

Note: For enriching traces with AWS infrastructure information, make sure that the AWS resource detector is supported for the AWS service in the OpenTelemetry SDK. For more information, see Using the AWS resource detectors.

X-Ray SDK can't reach the X-Ray daemon over the daemon address or port number

If X-Ray SDK can't reach the X-Ray daemon over the default or configured daemon port number and address, then the SDK misses traces. The X-Ray daemon address passes to the SDK with the AWS_XRAY_DAEMON_ADDRESS environmental variable. By default, the X-Ray daemon listens on port 2000 UDP. You can change this port through the command line option and the configuration file that's passed to the X-ray daemon. For more information, see Using a configuration file.

If you run the X-Ray daemon with debug logs, then run the following command to identify the UDP port that's configured for the X-Ray daemon:

./xray -l debug

Note: The following example identifies 3000 as the port number.

Example output:

2023-03-28T15:15:43-07:00 \[Debug\] Listening on UDP 127.0.0.1:3000

If you don't run the X-Ray daemon with debug logs, then turn on the option to debug logs and re-run the daemon. For more information, see Using a configuration file.

Note: If your X-Ray daemon doesn't run on the same machine as your application and X-Ray SDK, then check the inbound and ingress rules. Confirm that the security group inbound rule and network access control list (network ACL) ingress rule allow traffic. You must allow traffic for X-Ray SDK over the listener port of the X-Ray daemon.

X-Ray daemon can't reach the X-Ray service endpoint

If your X-Ray daemon can't reach the X-Ray service endpoint, then take the following actions:

  • If you configured a proxy at the source end, then check whether the proxy doesn't allow the traffic to reach the X-Ray service endpoint.
  • If you use the com.amazonaws.region.xray X-Ray Amazon Virtual Private Cloud (Amazon VPC) endpoint to connect to the X-Ray service, then check the attached security group. Make sure that the security group has outbound traffic that allows HTTPS traffic to push data to the X-Ray service API. Also, make sure that the security group allows inbound traffic on port 443 to allow the X-Ray daemon.
  • Verify that the associated network ACL and X-Ray Amazon VPC endpoint have an egress rule that allows traffic on port 443 for everywhere 0.0.0.0/0. The X-Ray daemon must send data to the X-Ray service endpoint. Also, verify that the network ACL ingress rule allows traffic from the X-Ray daemon to the Amazon VPC endpoint.

Tracing isn't configured at the individual service level

When you don't turn on X-Ray configuration for certain AWS services, then you can have missing traces. The following is a list of AWS services that require you to turn on X-Ray configuration on at the service level:

Missing IAM permissions

Confirm that you have the required IAM permissions:

  • If you receive Access Denied errors in the X-Ray daemon logs, then check the IAM permissions. To send traces to the X-Ray service endpoint, the X-Ray daemon IAM role requires permissions in the AWSXRAYDaemonWriteAccess managed policy. If you use a tags-based condition in your IAM policy, then make sure that the IAM policy refers to the correct tag. For more information, see Authorization based on X-Ray tags.
  • If the X-Ray daemon runs in Amazon Elastic Container Service (Amazon ECS) as a sidecar container in the task, then check the Amazon ECS task role. Make sure that the AWSXRayDaemonWriteAccess policy is attached to the Amazon ECS task role.
  • If you deploy the X-Ray daemon in Amazon Elastic Kubernetes Services (Amazon EKS), then check the sidecar container in the pod. Make sure that the container uses IAM roles for service accounts (IRSA) and the AWSXRayDaemonWriteAccess policy is attached to the IAM role.
  • If you configured the X-ray daemon in Amazon Elastic Compute Cloud (Amazon EC2), then check the associated instance profile. Make sure that the AWSXRayDaemonWriteAccess policy is attached to the instance profile role that's associated with that EC2 instance.
  • If you configured the X-Ray daemon in AWS Lambda, then check the Lambda execution role. Confirm whether the Lambda execution role has the AWSXRayDaemonWriteAccess policy. Make sure that the policy is attached.
  • If you turned on active tracing in Amazon SNS and you still can't see your traces, then configure a resource policy in X-Ray.

Note: To confirm the X-ray permissions, use the IAM policy simulator to test your IAM role or policy that's attached to the resource. Also, check the service control policies (SCPs) for your organization. SCPs restrict access at the account level for specific API calls.

For a list of IAM managed policies for X-Ray, see IAM managed policies for X-Ray.

Sampling rules configuration

If X-Ray SDK and AWS services that support active tracing with sampling configuration don't sample traces, then you can miss traces. Use sampling rules to determine the requests to record. By default, the X-Ray SDK records the first request each second and 5% of any additional requests. You can change the sampling rate to decrease or increase the number of traces that are recorded.

For example, a reservoir is set to 1. A reservoir of 1 indicates that the first request of each second is taken. Or, the sampling rate is set to 0.1. A sampling rate of 0.1 indicates that 10% of additional requests of each second is recorded.

Because X-Ray SDK limits the number of requests that the SDK records, all requests might not get traced. To record more traces, increase the reservoir and sampling rate settings. For more information, see Sampling rule examples.

Note: If you configured multiple sampling rules and the number of sampled traces fluctuates, then check the sampling rule that X-Ray SDK uses. For more information, see Viewing sampling results.

Missing configuration in the OpenTelemetry Collector

If you use OpenTelemetry Collector instead of the X-Ray daemon, then the required receiver and exporter configuration might be missing. If a configuration is missing, then traces aren't sent to the X-Ray service endpoint. Based on your receiver, check the receiver configuration in the config.yaml file to verify that there's no missing configuration.

See these example configurations on the GitHub website:

Note: Make sure that the required IAM permissions are in the AWSXRayDaemonWriteAccess managed policy.

AWS OFFICIAL
AWS OFFICIALUpdated 6 months ago