Do Spark SageMaker Instances have the Spark Snowflake Connector Installed?

0

Do Spark SageMaker Instances have the Spark Snowflake Connector Installed? If not, how can I ensure my SageMaker instances have it installed every time it is booted up?

My end use case is I want to access Snowflake directly via PySpark using the Spark Snowflake connector.

Julean
질문됨 일 년 전318회 조회
2개 답변
0

Hi,

Thank you for using AWS Sagemaker.

Looking at the above query, I’m assuming this is related to notebook instances, According to this doc: https://docs.snowflake.com/en/user-guide/spark-connector-install , it looks like you would require your notebook instance to be backed up by spark supporting services such as EMR cluster[1] or Glue endpoint[2] in order to run the driver.

[1] https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/

[2] https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-how-it-works.html

Also as per your use case to access Snowflake directly via PySpark using the Spark Snowflake connector I request you to kindly go through given documentation[3]:



[3]https://aws.amazon.com/blogs/machine-learning/use-snowflake-as-a-data-source-to-train-ml-models-with-amazon-sagemaker/

Also in regards to your query to ensure notebook instances have packages installed every time it is booted up:

The primary reason why the libraries don’t persist after a stop-start operation is that the storage is not persistent. Only the changes made to the ML storage volume are persisted with a stop-start. Generally, for the package and files to be persisted, they need to be under “/home/ec2-user/SageMaker”.

In other words, when you stop a notebook, SageMaker terminates the notebook's Amazon Elastic Compute Cloud (Amazon EC2) instance. Packages that are installed in the Conda environment don't persist between sessions. The /home/ec2-user/SageMaker directory is the only path that persists between notebook instance sessions. This is the directory for the notebook's Amazon Elastic Block Store (Amazon EBS) volume.

Thus, to achieve the use case, you will have to use the life cycle configurations as explained below: [+] Install External Libraries and Kernels in Notebook Instances - https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-add-external.html
[+] How can I install Python packages to a Conda environment on an Amazon SageMaker notebook instance? https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-package-conda/

If your installations take longer than 5 minutes which is the maximum time for life cycle configuration running, you can refer to below article: [+] How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries? https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-lifecycle-script-timeout/

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reach to AWS Support by creating a support case[2] so that the engineer can investigate further and help you overcome the issue.

Reference:

——————

[+] https://aws.amazon.com/premiumsupport/faqs/
[+] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create

AWS
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠