Running concurrent sessions from SageMaker notebooks on Glue Dev Endpoints.

Question

Customer who has created a AWS glue dev endpoint and want to run two Sagemaker notebooks in parallel on same single Dev endpoint but its not working .

The one which is invoked first is only able to run the job, while another one fails. what could be possible reasons and fix for it?

Accepted Answer

SageMaker notebooks are Jupyter notebooks that uses the SparkMagic module to connect to a local Livy setup. The local Livy does an SSH tunnel to Livy service on the Glue Spark server. Apache Livy binds to post 8998 and is a RESTful service that can relay multiple Spark session commands at the same time so multiple port binding conflicts cannot happen. So yes, you can have multiple sessions as long as the backend cluster has resources to serve that many sessions.

You can run the following command in a notebook to check the defaults for Spark sessions:

```
spark.sparkContext.getConf().getAll()
```

I see the following defaults in my Spark session. You can easily override them from the config file at ~/.sparkmagic/config.json or by using the %%configure magic from within the notebook.

```
spark.executor.cores 4
spark.executor.memory 5g
spark.driver.memory 5g
```

Note that spark.executor.instances is not set and spark.dynamicAllocation.enabled is not overridden which means that it is true, so if you have a demanding Spark job in one notebook, it can take over all resources in the cluster and prevent other Spark sessions from starting. The recommendation when sharing a single Glue Dev endpoint is to limit each session to a few executors so that multiple sessions can acquire resources from the cluster e.g.:

```
%%configure -f
{"executorMemory": "5G", "executorCores":4,"numExecutors":2}
```

(Note: Tested on multiple SageMaker PySpark notebooks in single SageMaker notebook instances as well as multiple SageMaker notebook instances.)

Running concurrent sessions from SageMaker notebooks on Glue Dev Endpoints.

相关内容