Glue + SageMaker Pip Packages

0

My customer is looking to use Glue dev endpoints along with a SageMaker notebook. What I've noticed is that in Glue, a package, in this case scipy, will be listed as 1.4.1, but this will or won't match what you get in a sagemaker notebook dependent on kernel.

conda_python3:

!pip show scipy
Name: scipy
Version: 1.1.0
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages
Requires: 
Required-by: seaborn, scikit-learn, sagemaker

conda_tensorflow_p36:

!pip show scipy
Name: scipy
Version: 1.4.1
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages
Requires: numpy
Required-by: seaborn, scikit-learn, sagemaker, Keras

Is there some sort of best practice to use a kernel that corresponds directly to what's installed on Glue?

Separate not very useful question. I wasn't able activate the venv that Jupyter notebooks do via shell. Is it using a venv? How come I can't find the right activate script?

gefragt vor 4 Jahren397 Aufrufe
1 Antwort
0
Akzeptierte Antwort

conda_python3 and conda_tensorflow_p36 are local kernels on the SageMaker notebook instance while the Spark kernels execute remotely in the Glue Spark environment.

Hence you are seeing different versions. The Glue Spark environment comes with 1.4.1 version of scipy. So when you use the PySpark (python) or Spark (scala) kernels and you will get the 1.4.1 version of scipy.

If you use the default LifeCycle script that Glue SageMaker notebooks already come with, the connectivity to the Glue Dev endpoint should be in place. Note that the Glue SageMaker notebooks has a tag called 'aws-glue-dev-endpoint' that is used to identify which Glue Dev endpoint that particular notebook instance communicates with.

The Spark kernels cannot be replicated via the python shell. Those kernels relay Spark commands via the Livy service to Spark on the Glue Dev endpoint using a Jupyter module called Sparkmagic.

Ref: https://github.com/jupyter-incubator/sparkmagic

AWS
beantwortet vor 4 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen