Emr Serverless Jobs are not getting stopped and run indefinetly with Error: Encountered errors when releasing containers: [{ContainerGroupId: ** ContainerId: **, ErrorCode: INTERNAL_ERROR}, ...]

0

We have a pyspark job which we are executing to connect with MongoDB using the mongo-spark-connector. The job is executed successfully with no errors in the stdout log file and in the stderr log file we get following error:

INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, [2600:1f13:e3d:5e02:5016:e800:a87c:5cd4], 42357, None)
INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
INFO DAGScheduler: Shuffle files lost for executor: 1 (epoch 1)
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]

Due to this the pyspark job is not getting stopped and is running indefinitely.

pyspark code snippet that we use for MongoDB connection which is getting executed successfully but the pyspark job is not terminating after execution :

spark = SparkSession.builder.appName(
            "test-app"
        ).config(
            "spark.jars.packages", "org.mongodb.spark:mongo-spark-connector:10.0.2"
        ).config(
            "spark.mongodb.read.connection.uri", mongo_url
        ).config(
            "spark.mongodb.write.connection.uri", mongo_url
        ).getOrCreate()

dataDF = spark.read.format("mongodb").option(
        "uri", mongo_url
    ).option(
        "database", database
    ).option(
        "collection", readFromCollection
    ).load()
dataDF.printSchema()
dataDF.show()
print(dataDF.count())

Can anyone please help me to understand why we are getting this error and any ways to avoid it. Also is there any way we can forcefully exit from a pyspark job in emr serverless. We already have tried with following options but none have worked so far:

  • spark.stop()
  • os._exit(0)
Chinmay
질문됨 일 년 전621회 조회
1개 답변
0

Hello There,

Thank you for your query.

I understand that your job for MongoDB using mongo-spark-connector is running successfully as per stdout logs, however on stderr log its showing INTERNAL_ERROR and runs indefinitely. You would like to undertand the root cause and fix for this. In order to answer this we need to look into the job logs and account information, that are non-public information. Could you please open a support case with AWS using this link [1].?

Regarding forcefully exiting from a pyspark job in emr serverless, I would suggest to use the executionTimeoutMinutes property on StartJobRun API [2] or the Job run settings on the console. The default is set to 720 minutes / 12 hours [3]. Please note however that setting the property to 0 will set the job to run continuously, which is idle for streaming jobs. However you can set it to the average time your job takes to perform and add some extra minutes for contingency.

Hope the above answers your question. If you need any further information, please get back to me or consider opening a Support ticket with AWS Premium Support.

Hope you have a great day ahead.

References:

[1] https://console.aws.amazon.com/support/home#/case/create [2] https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html [3] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/considerations.html

Runtime setting on EMR Serverless Console

profile pictureAWS
지원 엔지니어
Rajiv_M
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인