EMR Serverless jar job issues

0

Good day. I'm trying to submit EMR Serverless job using custom docker image for the serverless app and submitting JAR file to run. All script and spark arguments are passed correctly. Job stars but fails with error:

Error: ETL config file 's3:/path/to/file/etl.conf' must exist and be readable

File exists in the specified location, IAM role used for the job has full S3 access. What else can cause the problem?

demandé il y a 4 mois222 vues
1 réponse
1

Hello Sviat,

I understand that you are trying to run a spark application in EMR Serverless. Can you confirm if the EMR Serverless Spark application fails to read the file or you are trying to read the file in the driver (either python or scala).

If you are using driver to read the file, the interface which you might using might not support S3. With that said, I would recommend you to pass the file using --files or --conf spark.files and then access using pyspark.SparkFiles.get

An example code snippet on how to use it is as below.

import pyspark
from pyspark.sql import SparkSession
from pyspark import SparkFiles
spark = SparkSession.builder \
                    .appName('').enableHiveSupport() \
                    .getOrCreate()

path = SparkFiles.get("config.json")
f = open(path, "r")
print(f.read())
spark.stop()

if the above doesn't resolve your use-case, may i request you to share the spark properties you have used for the EMR Serverless application and also the code script where you are trying to access the file (if used).

AWS
répondu il y a 4 mois
profile pictureAWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
vérifié il y a 4 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions