Emr Serverless Jobs are not getting stopped and run indefinetly with Error: Encountered errors when releasing containers: [{ContainerGroupId: ** ContainerId: **, ErrorCode: INTERNAL_ERROR}, ...]

0

We have a pyspark job which we are executing to connect with MongoDB using the mongo-spark-connector. The job is executed successfully with no errors in the stdout log file and in the stderr log file we get following error:

INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, [2600:1f13:e3d:5e02:5016:e800:a87c:5cd4], 42357, None)
INFO BlockManagerMaster: Removed 1 successfully in removeExecutor
INFO DAGScheduler: Shuffle files lost for executor: 1 (epoch 1)
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]
WARN DefaultEmrServerlessRMClient: Encountered errors when releasing containers: [{ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 20c55db8-8fab-0c85-b530-2f5b44dc48cd,ErrorCode: INTERNAL_ERROR}, {ContainerGroupId: 8cc55db8-8f45-6671-d09e-dd1e0088906d,ContainerId: 04c55db8-8fbb-7666-3f7e-626522a44608,ErrorCode: INTERNAL_ERROR}]

Due to this the pyspark job is not getting stopped and is running indefinitely.

pyspark code snippet that we use for MongoDB connection which is getting executed successfully but the pyspark job is not terminating after execution :

spark = SparkSession.builder.appName(
            "test-app"
        ).config(
            "spark.jars.packages", "org.mongodb.spark:mongo-spark-connector:10.0.2"
        ).config(
            "spark.mongodb.read.connection.uri", mongo_url
        ).config(
            "spark.mongodb.write.connection.uri", mongo_url
        ).getOrCreate()

dataDF = spark.read.format("mongodb").option(
        "uri", mongo_url
    ).option(
        "database", database
    ).option(
        "collection", readFromCollection
    ).load()
dataDF.printSchema()
dataDF.show()
print(dataDF.count())

Can anyone please help me to understand why we are getting this error and any ways to avoid it. Also is there any way we can forcefully exit from a pyspark job in emr serverless. We already have tried with following options but none have worked so far:

  • spark.stop()
  • os._exit(0)
Chinmay
posta 8 mesi fa431 visualizzazioni
1 Risposta
0

Hello There,

Thank you for your query.

I understand that your job for MongoDB using mongo-spark-connector is running successfully as per stdout logs, however on stderr log its showing INTERNAL_ERROR and runs indefinitely. You would like to undertand the root cause and fix for this. In order to answer this we need to look into the job logs and account information, that are non-public information. Could you please open a support case with AWS using this link [1].?

Regarding forcefully exiting from a pyspark job in emr serverless, I would suggest to use the executionTimeoutMinutes property on StartJobRun API [2] or the Job run settings on the console. The default is set to 720 minutes / 12 hours [3]. Please note however that setting the property to 0 will set the job to run continuously, which is idle for streaming jobs. However you can set it to the average time your job takes to perform and add some extra minutes for contingency.

Hope the above answers your question. If you need any further information, please get back to me or consider opening a Support ticket with AWS Premium Support.

Hope you have a great day ahead.

References:

[1] https://console.aws.amazon.com/support/home#/case/create [2] https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html [3] https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/considerations.html

Runtime setting on EMR Serverless Console

profile pictureAWS
TECNICO DI SUPPORTO
Rajiv_M
con risposta 8 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande