- Newest
- Most votes
- Most comments
This can be helpful : https://aws.amazon.com/blogs/machine-learning/use-snowflake-as-a-data-source-to-train-ml-models-with-amazon-sagemaker/
Hi,
Thank you for using AWS Sagemaker.
Looking at the above query, I’m assuming this is related to notebook instances, According to this doc: https://docs.snowflake.com/en/user-guide/spark-connector-install , it looks like you would require your notebook instance to be backed up by spark supporting services such as EMR cluster[1] or Glue endpoint[2] in order to run the driver.
[2] https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-how-it-works.html
Also as per your use case to access Snowflake directly via PySpark using the Spark Snowflake connector I request you to kindly go through given documentation[3]:
Also in regards to your query to ensure notebook instances have packages installed every time it is booted up:
The primary reason why the libraries don’t persist after a stop-start operation is that the storage is not persistent. Only the changes made to the ML storage volume are persisted with a stop-start. Generally, for the package and files to be persisted, they need to be under “/home/ec2-user/SageMaker”.
In other words, when you stop a notebook, SageMaker terminates the notebook's Amazon Elastic Compute Cloud (Amazon EC2) instance. Packages that are installed in the Conda environment don't persist between sessions. The /home/ec2-user/SageMaker directory is the only path that persists between notebook instance sessions. This is the directory for the notebook's Amazon Elastic Block Store (Amazon EBS) volume.
Thus, to achieve the use case, you will have to use the life cycle configurations as explained below:
[+] Install External Libraries and Kernels in Notebook Instances -
https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-add-external.html
[+] How can I install Python packages to a Conda environment on an Amazon SageMaker notebook instance?
https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-package-conda/
If your installations take longer than 5 minutes which is the maximum time for life cycle configuration running, you can refer to below article: [+] How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries? https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-lifecycle-script-timeout/
To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reach to AWS Support by creating a support case[2] so that the engineer can investigate further and help you overcome the issue.
Reference:
——————
[+] https://aws.amazon.com/premiumsupport/faqs/
[+] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create
Relevant content
- Accepted Answer
- Accepted Answerasked 2 years ago
- asked 6 years ago
- AWS OFFICIALUpdated 3 days ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 years ago