How to publish Spark (EMR serverless) jobs logs to cloudwatch

0

I have created a Spark job with Scala and I am trying to find the way to get the logs into cloudwatch.

So far I have tried to package the job as an uber Jar with a cloudwatch appender and passing log4j options like this:

--class Main 
--conf spark.files=s3://fen-x-data-migration-1234/emr-demo/etl-job/conf/log4j.properties#log4j.properties 
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

I also tried to add the appender programatically.

What am I missing?

Thanks

asked 3 months ago53 views
1 Answer
1
Accepted Answer

Hi, Thanks for writing to re:Post.

As I understand you are facing an issue with getting logs into Cloudwatch.

I would like to inform you that, EMR Serverless integration with Cloudwatch is not available at the moment. They are currently under development and expected to be available in late Q3 2022. The best way to track the updates from customers side would be to keep an eye on the EMR Serverless documentation page [1].

Also, please note that given the volume of logs generated by Spark and Hive, Cloudwatch logs is not always cost-effective at that scale. Hence we have managed storage where EMR stores the logs for customer at no additional cost for 30 days[2]. Customers can also choose to store logs in S3.

Thanks and stay safe!

[1] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html
[2] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/logging.html

SUPPORT ENGINEER
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions