Running concurrent sessions from SageMaker notebooks on Glue Dev Endpoints.

0

Customer who has created a AWS glue dev endpoint and want to run two Sagemaker notebooks in parallel on same single Dev endpoint but its not working .

The one which is invoked first is only able to run the job, while another one fails. what could be possible reasons and fix for it?

AWS
已提問 4 年前檢視次數 628 次
1 個回答
0
已接受的答案

SageMaker notebooks are Jupyter notebooks that uses the SparkMagic module to connect to a local Livy setup. The local Livy does an SSH tunnel to Livy service on the Glue Spark server. Apache Livy binds to post 8998 and is a RESTful service that can relay multiple Spark session commands at the same time so multiple port binding conflicts cannot happen. So yes, you can have multiple sessions as long as the backend cluster has resources to serve that many sessions.

You can run the following command in a notebook to check the defaults for Spark sessions:

spark.sparkContext.getConf().getAll()

I see the following defaults in my Spark session. You can easily override them from the config file at ~/.sparkmagic/config.json or by using the %%configure magic from within the notebook.

spark.executor.cores 4
spark.executor.memory 5g
spark.driver.memory 5g

Note that spark.executor.instances is not set and spark.dynamicAllocation.enabled is not overridden which means that it is true, so if you have a demanding Spark job in one notebook, it can take over all resources in the cluster and prevent other Spark sessions from starting. The recommendation when sharing a single Glue Dev endpoint is to limit each session to a few executors so that multiple sessions can acquire resources from the cluster e.g.:

%%configure -f
{"executorMemory": "5G", "executorCores":4,"numExecutors":2}

(Note: Tested on multiple SageMaker PySpark notebooks in single SageMaker notebook instances as well as multiple SageMaker notebook instances.)

AWS
已回答 4 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南