PermissionError: [Errno 13] Permission denied: '/.cache' while installing python packages as part of AWS Glue job using --additional-python-modules parameter

0

I am trying to install sentence_transformers Python package as part of my AWS Glue Python script job. I am doing that by using the job parameter --additional-python-modules with the value of sentence_transformers.

However, while loading a sentence_transformers model, I consistently got Permission denied: '/.cache' error. The issue is caused by pip trying to write some package files to /.cache.... I tried to disable that using --no-cache-dir but no luck and not sure where to pass this correctly.

Could you please help how I can solve this; either on how to disable cache while installing Python packages using the Glue Job parameter of --additional-python-modules, or on how to give access to my AWS Glue job to write into /.cache directory?

Further details: I am using Python 3.9, AWS Glue 3.0, and IAM roles added to my Job include AWSGlueConsoleFullAccess.

Abri
질문됨 9달 전1011회 조회
2개 답변
0
수락된 답변

For anyone who faces this issue in the future; I was able to make this work by passing S3 path as a target location for caching.

Either one of these should work:

  1. Set ENV Variable before starting importing:

import os os.environ['SENTENCE_TRANSFORMERS_HOME'] = 's3-path'

  1. Pass cache_folder when loading the target model:

EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME, cache_folder = "s3-path")

Abri
답변함 9달 전
0

You must ensure that the IAM role you add to glue has read permissions for that bucket in S3

Enter image description here

After the comments I tried installing, I had no problems, I copied the information from my configuration, the only difference at first glance between my environment and the tutorial is that my Glue Job Role has adminAccess, you could try with a role that only has adminAccess for testing, then lower permissions

Configuration

Enter image description here

Running

Enter image description here

I hope I have helped you, if you have more details about the error write me in the comments

profile picture
전문가
답변함 9달 전
  • My IAM has full access to S3, and the /.cache directory is not S3 directory right? I assume it's a directory where the Job is running. Plus my Glue job is Python Script.

  • I am trying to replicate your problem, meanwhile I tried this tutorial and I installed the library without problems. --additional-python-modules pymysql==1.0.2 https://repost.aws/knowledge-center/glue-version2-external-python-libraries My Glue Role have full admin access, Now I'm going to try with sentence_transformers

  • I have tried with the following configuration and I have not had problems, I am going to edit my answer to show you my results --additional-python-modules sentence_transformers

  • Thank you for looking into this MaxCloud, so the problem of caching is happening when loading the model, installation seems to succeed normally, but can you try this code in your script and re-run the job?

    from langchain.embeddings import HuggingFaceEmbeddings EMBEDDING_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2" EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)

    Also need to add second python package - langchain - in your--additional-python-modules meaning it will be --additional-python-modules=sentence_transformers, langchain

  • EMBEDDING_MODEL = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME) this is the part where the problem is happening... I think the caching happens at loading the model not at installation.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠