How to publish Spark (EMR serverless) jobs logs to cloudwatch

0

I have created a Spark job with Scala and I am trying to find the way to get the logs into cloudwatch.

So far I have tried to package the job as an uber Jar with a cloudwatch appender and passing log4j options like this:

--class Main 
--conf spark.files=s3://fen-x-data-migration-1234/emr-demo/etl-job/conf/log4j.properties#log4j.properties 
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

I also tried to add the appender programatically.

What am I missing?

Thanks

已提問 2 年前檢視次數 1287 次
1 個回答
1
已接受的答案

Hi, Thanks for writing to re:Post.

As I understand you are facing an issue with getting logs into Cloudwatch.

I would like to inform you that, EMR Serverless integration with Cloudwatch is not available at the moment. They are currently under development and expected to be available in late Q3 2022. The best way to track the updates from customers side would be to keep an eye on the EMR Serverless documentation page [1].

Also, please note that given the volume of logs generated by Spark and Hive, Cloudwatch logs is not always cost-effective at that scale. Hence we have managed storage where EMR stores the logs for customer at no additional cost for 30 days[2]. Customers can also choose to store logs in S3.

Thanks and stay safe!

[1] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html
[2] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/logging.html

AWS
支援工程師
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南