How to find the Spark executor logs in AWS Glue

0

Hi,

I am runing a Spark ETL glue job and want to see how the executor's log through Spark UI. However, I didn't see the executor logs like stderr and stdout. Can you please advise how to find the logs of executors? In Cloudwatch, only driver's logs info can be found. Enter image description here

renyang
asked 12 days ago76 views
2 Answers
0

That is not a "life" SparkUI, so it doesn't have access to logs.
The stdout and stderr are stored on Cloudwatch under /aws-glue/jobs/output and /aws-glue/jobs/error respectively, the driver is named after the job run id and the executors have a longer name with the executor container starting from "_g-" (the tricky part is matching the container log to the executor id)

profile pictureAWS
EXPERT
answered 12 days ago
AWS
SUPPORT ENGINEER
reviewed 12 days ago
  • Thanks for your answer. After experimenting with the below code (Because the foreachPartition is working on executors.), I found the "print" result is on the log file "g-" of "/aws-glue/jobs/error". This is very hard to find out.

    # Sample DataFrame creation data = [("Alice", 34), ("Bob", 45), ("Charlie", 25)] columns = ["name", "age"] df = spark.createDataFrame(data, columns)

    # Custom function to print partition elements def print_partition(partition): # Iterate over rows in the partition and print for row in partition: print(row)

    # Apply foreachPartition to print each partition df.foreachPartition(print_partition)

0

Hi

CloudWatch Logs: AWS Glue automatically sends continuous logs to CloudWatch every 5 seconds and before each executor termination . You can view these logs in the CloudWatch console. However, by default, CloudWatch only stores driver logs.

Spark UI: Enable the Spark UI for your AWS Glue job. This will allow you to view the Spark application logs, including the executor logs (stderr and stdout) . You can configure the Spark UI to generate logs in two ways:

  • Legacy mode: This mode stores the logs on an S3 location.
  • Spark History Server: This mode stores the logs in a dedicated Spark History Server.

AWS Documentation source: https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html#monitor-spark-ui-jobs-console

profile picture
GK
answered 12 days ago
AWS
SUPPORT ENGINEER
reviewed 12 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions