Emr Serverless Jobs are not getting stopped and run indefinetly with Error: Encountered errors when releasing containers: [{ContainerGroupId: ** ContainerId: **, ErrorCode: INTERNAL_ERROR}, ...]

0

We have a pyspark job which we are executing to connect with MongoDB using the mongo-spark-connector. The job is executed successfully with no errors in the stdout log file and in the stderr log file we get following error:

INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, [2600:1f13:e3d:5e02:5016:e800:a87c:5cd4], 42357, None)
INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
INFO DAGScheduler: Shuffle files lost for executor: 1 (epoch 1)
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]

Due to this the pyspark job is not getting stopped and is running indefinitely.

pyspark code snippet that we use for MongoDB connection which is getting executed successfully but the pyspark job is not terminating after execution :

spark = SparkSession.builder.appName(
            "test-app"
        ).config(
            "spark.jars.packages", "org.mongodb.spark:mongo-spark-connector:10.0.2"
        ).config(
            "spark.mongodb.read.connection.uri", mongo_url
        ).config(
            "spark.mongodb.write.connection.uri", mongo_url
        ).getOrCreate()

dataDF = spark.read.format("mongodb").option(
        "uri", mongo_url
    ).option(
        "database", database
    ).option(
        "collection", readFromCollection
    ).load()
dataDF.printSchema()
dataDF.show()
print(dataDF.count())

Can anyone please help me to understand why we are getting this error and any ways to avoid it. Also is there any way we can forcefully exit from a pyspark job in emr serverless. We already have tried with following options but none have worked so far:

  • spark.stop()
  • os._exit(0)
Chinmay
demandé il y a un an615 vues
1 réponse
0

Hello There,

Thank you for your query.

I understand that your job for MongoDB using mongo-spark-connector is running successfully as per stdout logs, however on stderr log its showing INTERNAL_ERROR and runs indefinitely. You would like to undertand the root cause and fix for this. In order to answer this we need to look into the job logs and account information, that are non-public information. Could you please open a support case with AWS using this link [1].?

Regarding forcefully exiting from a pyspark job in emr serverless, I would suggest to use the executionTimeoutMinutes property on StartJobRun API [2] or the Job run settings on the console. The default is set to 720 minutes / 12 hours [3]. Please note however that setting the property to 0 will set the job to run continuously, which is idle for streaming jobs. However you can set it to the average time your job takes to perform and add some extra minutes for contingency.

Hope the above answers your question. If you need any further information, please get back to me or consider opening a Support ticket with AWS Premium Support.

Hope you have a great day ahead.

References:

[1] https://console.aws.amazon.com/support/home#/case/create [2] https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html [3] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/considerations.html

Runtime setting on EMR Serverless Console

profile pictureAWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
Rajiv_M
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions