How to publish Spark (EMR serverless) jobs logs to cloudwatch

0

I have created a Spark job with Scala and I am trying to find the way to get the logs into cloudwatch.

So far I have tried to package the job as an uber Jar with a cloudwatch appender and passing log4j options like this:

--class Main 
--conf spark.files=s3://fen-x-data-migration-1234/emr-demo/etl-job/conf/log4j.properties#log4j.properties 
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

I also tried to add the appender programatically.

What am I missing?

Thanks

已提问 2 年前1287 查看次数
1 回答
1
已接受的回答

Hi, Thanks for writing to re:Post.

As I understand you are facing an issue with getting logs into Cloudwatch.

I would like to inform you that, EMR Serverless integration with Cloudwatch is not available at the moment. They are currently under development and expected to be available in late Q3 2022. The best way to track the updates from customers side would be to keep an eye on the EMR Serverless documentation page [1].

Also, please note that given the volume of logs generated by Spark and Hive, Cloudwatch logs is not always cost-effective at that scale. Hence we have managed storage where EMR stores the logs for customer at no additional cost for 30 days[2]. Customers can also choose to store logs in S3.

Thanks and stay safe!

[1] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html
[2] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/logging.html

AWS
支持工程师
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则