Do Spark SageMaker Instances have the Spark Snowflake Connector Installed?

0

Do Spark SageMaker Instances have the Spark Snowflake Connector Installed? If not, how can I ensure my SageMaker instances have it installed every time it is booted up?

My end use case is I want to access Snowflake directly via PySpark using the Spark Snowflake connector.

Julean
asked a year ago303 views
2 Answers
0

Hi,

Thank you for using AWS Sagemaker.

Looking at the above query, I’m assuming this is related to notebook instances, According to this doc: https://docs.snowflake.com/en/user-guide/spark-connector-install , it looks like you would require your notebook instance to be backed up by spark supporting services such as EMR cluster[1] or Glue endpoint[2] in order to run the driver.

[1] https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/

[2] https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-how-it-works.html

Also as per your use case to access Snowflake directly via PySpark using the Spark Snowflake connector I request you to kindly go through given documentation[3]:



[3]https://aws.amazon.com/blogs/machine-learning/use-snowflake-as-a-data-source-to-train-ml-models-with-amazon-sagemaker/

Also in regards to your query to ensure notebook instances have packages installed every time it is booted up:

The primary reason why the libraries don’t persist after a stop-start operation is that the storage is not persistent. Only the changes made to the ML storage volume are persisted with a stop-start. Generally, for the package and files to be persisted, they need to be under “/home/ec2-user/SageMaker”.

In other words, when you stop a notebook, SageMaker terminates the notebook's Amazon Elastic Compute Cloud (Amazon EC2) instance. Packages that are installed in the Conda environment don't persist between sessions. The /home/ec2-user/SageMaker directory is the only path that persists between notebook instance sessions. This is the directory for the notebook's Amazon Elastic Block Store (Amazon EBS) volume.

Thus, to achieve the use case, you will have to use the life cycle configurations as explained below: [+] Install External Libraries and Kernels in Notebook Instances - https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-add-external.html
[+] How can I install Python packages to a Conda environment on an Amazon SageMaker notebook instance? https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-package-conda/

If your installations take longer than 5 minutes which is the maximum time for life cycle configuration running, you can refer to below article: [+] How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries? https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-lifecycle-script-timeout/

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reach to AWS Support by creating a support case[2] so that the engineer can investigate further and help you overcome the issue.

Reference:

——————

[+] https://aws.amazon.com/premiumsupport/faqs/
[+] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create

AWS
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions