I have a need to send traces from AWS Glue job (written in Python script) to AWS X-Ray. Since X-Ray does not support AWS Glue out of the box, I needed to write little more code to instrument Python script to be able to send traces. I found this link from Chariot Solutions and tried to follow the steps but it's not working, it doesn't give error either. According to this article, it seems we don't even need to spin up a daemon because we have custom emitter. Here is code

import boto3
import io
import json
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.context import SparkContext
import aws_xray_sdk.core

class DirectEmitter:
    def __init__(self):
        self.xray_client = None  # lazily initialize
    def send_entity(self, entity):
        if not self.xray_client:
            self.xray_client = boto3.client('xray')
        segment_doc = json.dumps(entity.to_dict())
    def set_daemon_address(self, address):
    def ip(self):
        return None
    def port(self):
        return None


sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

zip_file = ''
bucket_name = 'mybucket-dev-etl-1'
output_folder = 'myfolder/obf/output'
raw_folder = 'myfolder/obf/raw'

segment = aws_xray_sdk.core.xray_recorder.begin_segment('segment_name')
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket_name, Key=zip_file)
zip_data = io.BytesIO(obj['Body'].read())
segment.put_metadata('key', 'krish-dict', 'namespace')
subsegment = aws_xray_sdk.core.xray_recorder.begin_subsegment('subsegment_name')
with aws_xray_sdk.core.xray_recorder.capture('subsegment_name'):
    extracted_files = extract_zip(zip_data) #this line calls exernal library to extract the file but library is not imported here for security
for file_name, file_content in extracted_files.items():
    subsegment.put_annotation('key', 'krish-value')
    s3.put_object(Bucket=bucket_name, Key=f'{raw_folder}/{file_name}', Body=file_content)

print('extracting complete')

Am I missing anything here? It seems Daemon is not working for some reason, may be because there is no daemon? but my understanding is that if we have that custom emitter, we don't need to create separate daemon running explicitly? Your any comment or advise would be much appreciated.

asked 3 months ago
1 Answer

To send the X-Ray trace segments from your service, you have to run X-Ray Daemon to receive and send the data. The article you referred, they also have ran a daemon to send segment from SDK to X-Ray backend.

answered 3 months ago

