Emr Serverless Jobs are not getting stopped and run indefinetly with Error: Encountered errors when releasing containers: [{ContainerGroupId: ** ContainerId: **, ErrorCode: INTERNAL_ERROR}, ...]

0

We have a pyspark job which we are executing to connect with MongoDB using the mongo-spark-connector. The job is executed successfully with no errors in the stdout log file and in the stderr log file we get following error:

INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, [2600:1f13:e3d:5e02:5016:e800:a87c:5cd4], 42357, None)
INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
INFO DAGScheduler: Shuffle files lost for executor: 1 (epoch 1)
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]

Due to this the pyspark job is not getting stopped and is running indefinitely.

pyspark code snippet that we use for MongoDB connection which is getting executed successfully but the pyspark job is not terminating after execution :

spark = SparkSession.builder.appName(
            "test-app"
        ).config(
            "spark.jars.packages", "org.mongodb.spark:mongo-spark-connector:10.0.2"
        ).config(
            "spark.mongodb.read.connection.uri", mongo_url
        ).config(
            "spark.mongodb.write.connection.uri", mongo_url
        ).getOrCreate()

dataDF = spark.read.format("mongodb").option(
        "uri", mongo_url
    ).option(
        "database", database
    ).option(
        "collection", readFromCollection
    ).load()
dataDF.printSchema()
dataDF.show()
print(dataDF.count())

Can anyone please help me to understand why we are getting this error and any ways to avoid it. Also is there any way we can forcefully exit from a pyspark job in emr serverless. We already have tried with following options but none have worked so far:

  • spark.stop()
  • os._exit(0)
Chinmay
已提問 1 年前檢視次數 618 次
1 個回答
0

Hello There,

Thank you for your query.

I understand that your job for MongoDB using mongo-spark-connector is running successfully as per stdout logs, however on stderr log its showing INTERNAL_ERROR and runs indefinitely. You would like to undertand the root cause and fix for this. In order to answer this we need to look into the job logs and account information, that are non-public information. Could you please open a support case with AWS using this link [1].?

Regarding forcefully exiting from a pyspark job in emr serverless, I would suggest to use the executionTimeoutMinutes property on StartJobRun API [2] or the Job run settings on the console. The default is set to 720 minutes / 12 hours [3]. Please note however that setting the property to 0 will set the job to run continuously, which is idle for streaming jobs. However you can set it to the average time your job takes to perform and add some extra minutes for contingency.

Hope the above answers your question. If you need any further information, please get back to me or consider opening a Support ticket with AWS Premium Support.

Hope you have a great day ahead.

References:

[1] https://console.aws.amazon.com/support/home#/case/create [2] https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html [3] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/considerations.html

Runtime setting on EMR Serverless Console

profile pictureAWS
支援工程師
Rajiv_M
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南