Do Spark SageMaker Instances have the Spark Snowflake Connector Installed?

0

Do Spark SageMaker Instances have the Spark Snowflake Connector Installed? If not, how can I ensure my SageMaker instances have it installed every time it is booted up?

My end use case is I want to access Snowflake directly via PySpark using the Spark Snowflake connector.

Julean
已提問 1 年前檢視次數 319 次
2 個答案
0

Hi,

Thank you for using AWS Sagemaker.

Looking at the above query, I’m assuming this is related to notebook instances, According to this doc: https://docs.snowflake.com/en/user-guide/spark-connector-install , it looks like you would require your notebook instance to be backed up by spark supporting services such as EMR cluster[1] or Glue endpoint[2] in order to run the driver.

[1] https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/

[2] https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-how-it-works.html

Also as per your use case to access Snowflake directly via PySpark using the Spark Snowflake connector I request you to kindly go through given documentation[3]:



[3]https://aws.amazon.com/blogs/machine-learning/use-snowflake-as-a-data-source-to-train-ml-models-with-amazon-sagemaker/

Also in regards to your query to ensure notebook instances have packages installed every time it is booted up:

The primary reason why the libraries don’t persist after a stop-start operation is that the storage is not persistent. Only the changes made to the ML storage volume are persisted with a stop-start. Generally, for the package and files to be persisted, they need to be under “/home/ec2-user/SageMaker”.

In other words, when you stop a notebook, SageMaker terminates the notebook's Amazon Elastic Compute Cloud (Amazon EC2) instance. Packages that are installed in the Conda environment don't persist between sessions. The /home/ec2-user/SageMaker directory is the only path that persists between notebook instance sessions. This is the directory for the notebook's Amazon Elastic Block Store (Amazon EBS) volume.

Thus, to achieve the use case, you will have to use the life cycle configurations as explained below: [+] Install External Libraries and Kernels in Notebook Instances - https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-add-external.html
[+] How can I install Python packages to a Conda environment on an Amazon SageMaker notebook instance? https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-package-conda/

If your installations take longer than 5 minutes which is the maximum time for life cycle configuration running, you can refer to below article: [+] How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries? https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-lifecycle-script-timeout/

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reach to AWS Support by creating a support case[2] so that the engineer can investigate further and help you overcome the issue.

Reference:

——————

[+] https://aws.amazon.com/premiumsupport/faqs/
[+] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create

AWS
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南