Spark submit is failing in cluster mode for pyspark application

1

Hello Team,

We are facing issues with EMR 6.10.0 SPARK(pyspark) Application and following are the details for it ,

  1. Spark submit is failing in cluster mode for pyspark application
  2. However we have set the environment variables and its working fine for us in client mode and local mode when we execute it from master node.
  3. command is as follows: spark-submit --master yarn --deploy-mode cluster main.py
  4. we tried multiple options via runtime but still our pyspark code is failing in cluster mode with below two errors are as follows:

** Error 1: With Spark submit command from master node we are getting below error message:**

INFO Client: Application report for application_1693454247420_0022 (state: FAILED) 23/08/31 11:06:17 INFO Client: client token: N/A diagnostics: Application application_1693454247420_0022 failed 2 times due to AM Container for appattempt_1693454247420_0022_000002 exited with exitCode: 13 Failing this attempt.Diagnostics: [2023-08-31 11:06:16.928]Exception from container-launch. Container id: container_1693454247420_0022_02_000001 Exit code: 13

INFO Client: Application report for application_1693454247420_0022 (state: FAILED) 23/08/31 11:06:17 INFO Client: client token: N/A diagnostics: Application application_1693454247420_0022 failed 2 times due to AM Container for appattempt_1693454247420_0022_000002 exited with exitCode: 13 Failing this attempt.Diagnostics: [2023-08-31 11:06:16.928]Exception from container-launch. Container id: container_1693454247420_0022_02_000001 Exit code: 13


** Error 2: Below is the error on the Datanode side i.e its the errors on Node Manager logs** Node manager logs:

2023-08-31 11:37:40,223 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null2023-08-31 11:39:40,250 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null2023-08-31 11:41:40,282 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null2023-08-31 11:43:40,305 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null2023-08-31 11:45:40,327 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null2023-08-31 11:46:38,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService (NM ContainerManager dispatcher): Cache Size Before Clean: 8129896432, Total Deleted: 0, Public Deleted: 0, Private Deleted: 02023-08-31 11:46:38,157 INFO org.apache.hadoop.yarn.server.nodemanager.logaggregation.tracker.NMLogAggregationStatusTracker (Timer-0): Rolling over the cached log aggregation status.2023-08-31 11:47:40,353 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null

2023-08-31 12:01:42,512 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /10.0.100.107:8033addToClusterNodeLabels: java.io.IOException: Node-label-based scheduling is disabled. Please check yarn.node-labels.enabled    at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.logAndWrapException(AdminService.java:936)    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.addToClusterNodeLabels(AdminService.java:821)    at org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.addToClusterNodeLabels(ResourceManagerAdministrationProtocolPBServiceImpl.java:270)    at org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:311)    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)    at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1175)    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1099)    at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:422)    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3316)Caused by: java.io.IOException: Node-label-based scheduling is disabled. Please check yarn.node-labels.enabled    at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.addToCluserNodeLabels(CommonNodeLabelsManager.java:309)    at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.addToCluserNodeLabels(RMNodeLabelsManager.java:141)    at org.apache.hadoop.yarn.server.resourcemanager.AdminService.addToClusterNodeLabels(AdminService.java:816)    ... 12 more

2023-08-31 12:07:40,677 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null 2023-08-31 12:09:40,695 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null 2023-08-31 12:11:40,724 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null 2023-08-31 12:13:40,744 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl (Node Status Updater): NM node labels {CORE:exclusivity=true} were not accepted by RM and message from RM : null


Team please suggests some inputs or workarounds or any resolutions for my above blockers.

Thank you

Javali
asked 9 months ago830 views
1 Answer
3

Hello, Due to an issue with AM Container launch, your spark app has failed with Exit code: 13 which is more generic exception. There are some reasons AM might be failed as described below,

  1. EMR 6.10.0(as per your version) officially only supports Spark 3.3.1, there are a number of packages with versions different from the component versions that EMR 6.10.0 support. Please check the packages in your applications compatible with Spark 3.3.1 version. This might be unnoticed in the local mode.
  2. Please make sure the application modules/dependencies are installed and shipped alongside driver that deployed on the core node. You can refer the application logs stderr/stdout files if any issue explicitly reported.
  3. Needless to say, please also ensure that your python code's spark context not points to local or things that override the cluster mode.
  4. Please check resource manager log and all the container logs to find any clue on failing container launch.
  5. You may also test running a sample application in cluster mode to verify if the cluster configurations are not violating the cluster mode
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster /usr/lib/spark/examples/jars/spark-examples.jar 10
yarn logs -applicationId application_XXXXXXXXXXXXX
  1. In case it requires more assistance specifically on your setup, please feel free to connect AWS Premium Support
AWS
SUPPORT ENGINEER
answered 8 months ago
  • My case, I have issue in dependency incompatibility which did not allow me to submit cluster node. Thanks

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions