- Newest
- Most votes
- Most comments
For anyone who faces this issue in the future; I was able to make this work by passing S3 path as a target location for caching.
Either one of these should work:
- Set ENV Variable before starting importing:
import os os.environ['SENTENCE_TRANSFORMERS_HOME'] = 's3-path'
- Pass cache_folder when loading the target model:
EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME, cache_folder = "s3-path")
You must ensure that the IAM role you add to glue has read permissions for that bucket in S3
After the comments I tried installing, I had no problems, I copied the information from my configuration, the only difference at first glance between my environment and the tutorial is that my Glue Job Role has adminAccess, you could try with a role that only has adminAccess for testing, then lower permissions
Configuration
Running
I hope I have helped you, if you have more details about the error write me in the comments
Relevant content
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago
My IAM has full access to S3, and the /.cache directory is not S3 directory right? I assume it's a directory where the Job is running. Plus my Glue job is Python Script.
I am trying to replicate your problem, meanwhile I tried this tutorial and I installed the library without problems. --additional-python-modules pymysql==1.0.2 https://repost.aws/knowledge-center/glue-version2-external-python-libraries My Glue Role have full admin access, Now I'm going to try with sentence_transformers
I have tried with the following configuration and I have not had problems, I am going to edit my answer to show you my results --additional-python-modules sentence_transformers
Thank you for looking into this MaxCloud, so the problem of caching is happening when loading the model, installation seems to succeed normally, but can you try this code in your script and re-run the job?
from langchain.embeddings import HuggingFaceEmbeddings EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2" EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)
Also need to add second python package - langchain - in your--additional-python-modules meaning it will be --additional-python-modules=sentence_transformers, langchain
EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME) this is the part where the problem is happening... I think the caching happens at loading the model not at installation.