ModuleNotFoundError when starting a training job on Sagemaker

0

I want to submit a training job on sagemaker. I tried it on notebook and it works. When I try the following I get ModuleNotFoundError: No module named 'nltk'

My code is

import sagemaker  
from sagemaker.pytorch import PyTorch

JOB_PREFIX   = 'pyt-ic'
FRAMEWORK_VERSION = '1.3.1'

estimator = PyTorch(entry_point='finetune-T5.py',
                   source_dir='../src',
                   train_instance_type='ml.p2.xlarge' ,
                   train_instance_count=1,
                   role=sagemaker.get_execution_role(),
                   framework_version=FRAMEWORK_VERSION, 
                   debugger_hook_config=False,  
                   py_version='py3',
                   base_job_name=JOB_PREFIX)

estimator.fit()

finetune-T5.py have many other libraries that are not installed. How can I install the missing library? Or is there a better way to run the training job?

  • I tried adding nltk to requirements.txt file in scripts directory which worked for another module but not nltk; what could I be doing wrong?

已提問 4 年前檢視次數 1572 次
1 個回答
0
已接受的答案

Check out this link (Using third-party libraries section) on how to install third-party libraries for training jobs. You need to create requirement.txt file in the same directory as your training script to install other dependencies at runtime.

AWS
Sam
已回答 4 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南