- 最新
- 最多得票
- 最多評論
SageMaker notebooks are Jupyter notebooks that uses the SparkMagic module to connect to a local Livy setup. The local Livy does an SSH tunnel to Livy service on the Glue Spark server. Apache Livy binds to post 8998 and is a RESTful service that can relay multiple Spark session commands at the same time so multiple port binding conflicts cannot happen. So yes, you can have multiple sessions as long as the backend cluster has resources to serve that many sessions.
You can run the following command in a notebook to check the defaults for Spark sessions:
spark.sparkContext.getConf().getAll()
I see the following defaults in my Spark session. You can easily override them from the config file at ~/.sparkmagic/config.json or by using the %%configure magic from within the notebook.
spark.executor.cores 4
spark.executor.memory 5g
spark.driver.memory 5g
Note that spark.executor.instances is not set and spark.dynamicAllocation.enabled is not overridden which means that it is true, so if you have a demanding Spark job in one notebook, it can take over all resources in the cluster and prevent other Spark sessions from starting. The recommendation when sharing a single Glue Dev endpoint is to limit each session to a few executors so that multiple sessions can acquire resources from the cluster e.g.:
%%configure -f
{"executorMemory": "5G", "executorCores":4,"numExecutors":2}
(Note: Tested on multiple SageMaker PySpark notebooks in single SageMaker notebook instances as well as multiple SageMaker notebook instances.)
相關內容
- 已提問 1 年前
- AWS 官方已更新 2 年前
- AWS 官方已更新 1 年前