How to publish Spark (EMR serverless) jobs logs to cloudwatch

0

I have created a Spark job with Scala and I am trying to find the way to get the logs into cloudwatch.

So far I have tried to package the job as an uber Jar with a cloudwatch appender and passing log4j options like this:

--class Main 
--conf spark.files=s3://fen-x-data-migration-1234/emr-demo/etl-job/conf/log4j.properties#log4j.properties 
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties 
--conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

I also tried to add the appender programatically.

What am I missing?

Thanks

posta 2 anni fa1287 visualizzazioni
1 Risposta
1
Risposta accettata

Hi, Thanks for writing to re:Post.

As I understand you are facing an issue with getting logs into Cloudwatch.

I would like to inform you that, EMR Serverless integration with Cloudwatch is not available at the moment. They are currently under development and expected to be available in late Q3 2022. The best way to track the updates from customers side would be to keep an eye on the EMR Serverless documentation page [1].

Also, please note that given the volume of logs generated by Spark and Hive, Cloudwatch logs is not always cost-effective at that scale. Hence we have managed storage where EMR stores the logs for customer at no additional cost for 30 days[2]. Customers can also choose to store logs in S3.

Thanks and stay safe!

[1] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html
[2] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/logging.html

AWS
TECNICO DI SUPPORTO
con risposta 2 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande