Unable to see X-Ray trace for AWS Batch job

0

I am running a batch job that starts an EC2 instance and runs a Java application, and I am trying to generate X-Ray trace for the job. In my docker entrypoint I install X-Ray daemon and start it

curl https://s3.us-west-2.amazonaws.com/aws-xray-assets.us-west-2/xray-daemon/aws-xray-daemon-3.x.rpm -o /tmp/xray.rpm
yum install -v -y /tmp/xray.rpm
xray -o -n us-west-2 &

My Java code is instrumented as such

try {
    Segment segment = AWSXRay.beginSegment("load-data-segment");
    logger.info("Segment started traceID={}, segment", segment.getTraceId(), segment.prettySerialize());
    Subsegment subSegment = AWSXRay.beginSubsegment("load-data-sub-segment");
    logger.info("started subSegment traceId={}, {}", subSegment.getTraceId(), subSegment.prettySerialize());
    subSegment.putAnnotation("job_id", "load-data-job");
    ...
} finally {
    logger.info("Ending subsegment {}, {}", subSegment.getTraceId(), subSegment.prettySerialize());
    AWSXRay.endSubsegment();
    logger.info("Ending segment {}, {}", segment.getTraceId(), segment.prettySerialize());
    AWSXRay.endSegment();
}

And I can see from CloudWatch log that the daemon is installed and started, and the segments are created and eventually terminated normally.

...
EVENTS  1667282262329   Running X-Ray: xray -o -n us-west-2 &   1667282258652
EVENTS  1667282262329   Getting Secret: dev-shared-kc-local.yml 1667282258654
EVENTS  1667282262329   2022-11-01T05:57:38Z [Info] Initializing AWS X-Ray daemon 3.3.5 1667282258658
EVENTS  1667282262329   2022-11-01T05:57:38Z [Info] Using buffer memory limit of 321 MB 1667282258658
EVENTS  1667282262329   2022-11-01T05:57:38Z [Info] 5136 segment buffers allocated  1667282258658
EVENTS  1667282262329   2022-11-01T05:57:38Z [Info] Using region: us-west-2 1667282258679
EVENTS  1667282262329   2022-11-01T05:57:38Z [Info] HTTP Proxy server using X-Ray Endpoint : https://xray.us-west-2.amazonaws.com   1667282258679
EVENTS  1667282262329   2022-11-01T05:57:38Z [Info] Starting proxy http server on 127.0.0.1:2000    1667282258679
...
VENTS  1667282262329   2022-11-01 05:57:41,933 [main] [default] INFO  - Segment started traceID=1-6360b555-5161dc6ff45fcca197606de8, segment   1667282261934
EVENTS  1667282262329   2022-11-01 05:57:41,941 [main] [default] INFO  - started subSegment traceId=null, { 1667282261941
EVENTS  1667282262329     "name" : "load-data-sub-segment", 1667282261941
EVENTS  1667282262329     "id" : "348f5fd5734d2002",    1667282261941
EVENTS  1667282262329     "start_time" : 1.667282261934E9,  1667282261941
EVENTS  1667282262329     "in_progress" : true  1667282261941
EVENTS  1667282262329   }   1667282261941
...
...
EVENTS  1667282596467   2022-11-01 06:03:16,334 [main] [default] INFO  - Ending subsegment null, {  1667282596334
...
EVENTS  1667282596467   2022-11-01 06:03:16,336 [main] [default] INFO  - Ending segment 1-6360b555-5161dc6ff45fcca197606de8

However I am completely unable to find any tracing from X-Ray console. I use the queries like annotation.job_id, but nothing shows up. I think I have all the necessary polices added to my role (AWSXRayDaemonWriteAccess), but I am suspecting if I didn't set it up properly. Where and how can I start debugging this? I feel like completely at wits' end. Thank you for any suggestions!

asked a year ago545 views
2 Answers
0

The code looks good overall, I don't find any problem. Would you mind add log for segment.isSampled() ? XRay SDK does not emit segment if sample flag is 0/false.

AWS
answered a year ago
  • Thank you for suggestion. Here's the output

    2022-11-03 20:11:36,083 [main] [default] INFO  - Segment: isSampled: true, isRecording: true, isEmitted false, isError false, isFault false, inProgress true, isRecording true
    2022-11-03 20:11:36,093 [main] [default] INFO  - subSegment: isFault false, isError false, isEmitted false, inProgress true, isThrottle false        1667506296094
    
0

Still not find any problem. Please print Daemon debug logs by xray -o -n us-west-2 -l dev and check whether X-Ray Daemon has logs like

2022-11-06T20:54:30-08:00 [Debug] Received request on HTTP Proxy server : /GetSamplingRules
2022/11/06 20:54:31 http: proxy error: context canceled
2022-11-06T20:55:00-08:00 [Debug] Skipped telemetry data as no segments found
2022-11-06T20:55:05-08:00 [Debug] Received request on HTTP Proxy server : /GetSamplingRules
2022-11-06T20:55:09-08:00 [Debug] processor: sending partial batch
2022-11-06T20:55:09-08:00 [Debug] processor: segment batch size: 1. capacity: 50
2022-11-06T20:55:10-08:00 [Info] Successfully sent batch of 1 segments (1.501 seconds)
AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions