Glue + SageMaker Pip Packages

0

My customer is looking to use Glue dev endpoints along with a SageMaker notebook. What I've noticed is that in Glue, a package, in this case scipy, will be listed as 1.4.1, but this will or won't match what you get in a sagemaker notebook dependent on kernel.

conda_python3:

!pip show scipy
Name: scipy
Version: 1.1.0
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: 
Required-by: seaborn, scikit-learn, sagemaker

conda_tensorflow_p36:

!pip show scipy
Name: scipy
Version: 1.4.1
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages
Requires: numpy
Required-by: seaborn, scikit-learn, sagemaker, Keras

Is there some sort of best practice to use a kernel that corresponds directly to what's installed on Glue?

Separate not very useful question. I wasn't able activate the venv that Jupyter notebooks do via shell. Is it using a venv? How come I can't find the right activate script?

질문됨 4년 전396회 조회
1개 답변
0
수락된 답변

conda_python3 and conda_tensorflow_p36 are local kernels on the SageMaker notebook instance while the Spark kernels execute remotely in the Glue Spark environment.

Hence you are seeing different versions. The Glue Spark environment comes with 1.4.1 version of scipy. So when you use the PySpark (python) or Spark (scala) kernels and you will get the 1.4.1 version of scipy.

If you use the default LifeCycle script that Glue SageMaker notebooks already come with, the connectivity to the Glue Dev endpoint should be in place. Note that the Glue SageMaker notebooks has a tag called 'aws-glue-dev-endpoint' that is used to identify which Glue Dev endpoint that particular notebook instance communicates with.

The Spark kernels cannot be replicated via the python shell. Those kernels relay Spark commands via the Livy service to Spark on the Glue Dev endpoint using a Jupyter module called Sparkmagic.

Ref: https://github.com/jupyter-incubator/sparkmagic

AWS
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠