Glue + SageMaker Pip Packages

0

My customer is looking to use Glue dev endpoints along with a SageMaker notebook. What I've noticed is that in Glue, a package, in this case scipy, will be listed as 1.4.1, but this will or won't match what you get in a sagemaker notebook dependent on kernel.

conda_python3:

!pip show scipy
Name: scipy
Version: 1.1.0
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: 
Required-by: seaborn, scikit-learn, sagemaker

conda_tensorflow_p36:

!pip show scipy
Name: scipy
Version: 1.4.1
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages
Requires: numpy
Required-by: seaborn, scikit-learn, sagemaker, Keras

Is there some sort of best practice to use a kernel that corresponds directly to what's installed on Glue?

Separate not very useful question. I wasn't able activate the venv that Jupyter notebooks do via shell. Is it using a venv? How come I can't find the right activate script?

已提问 4 年前392 查看次数
1 回答
0
已接受的回答

conda_python3 and conda_tensorflow_p36 are local kernels on the SageMaker notebook instance while the Spark kernels execute remotely in the Glue Spark environment.

Hence you are seeing different versions. The Glue Spark environment comes with 1.4.1 version of scipy. So when you use the PySpark (python) or Spark (scala) kernels and you will get the 1.4.1 version of scipy.

If you use the default LifeCycle script that Glue SageMaker notebooks already come with, the connectivity to the Glue Dev endpoint should be in place. Note that the Glue SageMaker notebooks has a tag called 'aws-glue-dev-endpoint' that is used to identify which Glue Dev endpoint that particular notebook instance communicates with.

The Spark kernels cannot be replicated via the python shell. Those kernels relay Spark commands via the Livy service to Spark on the Glue Dev endpoint using a Jupyter module called Sparkmagic.

Ref: https://github.com/jupyter-incubator/sparkmagic

AWS
已回答 4 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则