By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon SageMaker Deployment

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

No such file or directory: '/opt/ml/input/data/test/revenue_train.csv' Sagemaker [SM_CHANNEL_TRAIN]

I am trying to deploy my RandomForestClassifier on Amazon Sagemaker using Python SDK. I have been following this example https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-script-mode/sagemaker-script-mode.ipynb but keep getting an error that the train file was not found. I think the file were not uploaded to the correct channel. When I run the script as follows it works fine. ``` ! python script_rf.py --model-dir ./ \ --train ./ \ --test ./ \ ``` This is my script code: ``` # inference functions --------------- def model_fn(model_dir): clf = joblib.load(os.path.join(model_dir, "model.joblib")) return clf if __name__ =='__main__': print('extracting arguments') parser = argparse.ArgumentParser() # hyperparameters sent by the client are passed as command-line arguments to the script. parser.add_argument('--max_depth', type=int, default=2) parser.add_argument('--n_estimators', type=int, default=100) parser.add_argument('--random_state', type=int, default=0) # Data, model, and output directories parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR')) parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TEST')) parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST')) parser.add_argument('--train-file', type=str, default='revenue_train.csv') parser.add_argument('--test-file', type=str, default='revenue_test.csv') args, _ = parser.parse_known_args() print('reading data') train_df = pd.read_csv(os.path.join(args.train, args.train_file)) test_df = pd.read_csv(os.path.join(args.test, args.test_file)) if len(train_df) == 0: raise ValueError(('There are no files in {}.\n').format(args.train, "train")) print('building training and testing datasets') attributes = ['available_minutes_100','ampido_slots_amount','ampido_slots_amount_100','ampido_slots_amount_200','ampido_slots_amount_300','min_dist_loc','count_event','min_dist_phouses','count_phouses','min_dist_stops','count_stops','min_dist_tickets','count_tickets','min_dist_google','min_dist_psa','count_psa'] X_train = train_df[attributes] X_test = test_df[attributes] y_train = train_df['target'] y_test = test_df['target'] # train print('training model') model = RandomForestClassifier( max_depth =args.max_depth, n_estimators = args.n_estimators) model.fit(X_train, y_train) # persist model path = os.path.join(args.model_dir, "model_rf.joblib") joblib.dump(model, path) print('model persisted at ' + path) # print accuracy and confusion matrix print('validating model') y_pred=model.predict(X_test) print('Confusion Matrix:') result = confusion_matrix(y_test, y_pred) print(result) print('Accuracy:') result2 = accuracy_score(y_test, y_pred) print(result2) ``` the error is raised in the train_df line of the script (FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/test/revenue_train.csv'). I tried specifying the input parameters: ``` # change channel input dirs inputs = { "train": "ampido-exports/production/revenue_train", "test": "ampido-exports/production/revenue_test", } from sagemaker.sklearn.estimator import SKLearn enable_local_mode_training = False hyperparameters = {"max_depth": 2, 'random_state':0, "n_estimators": 100} if enable_local_mode_training: train_instance_type = "local" inputs = {"train": trainpath, "test": testpath} else: train_instance_type = "ml.c5.xlarge" inputs = {"train": trainpath, "test": testpath} estimator_parameters = { "entry_point": "script_rf.py", "framework_version": "1.0-1", "py_version": "py3", "instance_type": train_instance_type, "instance_count": 1, "hyperparameters": hyperparameters, "role": role, "base_job_name": "randomforestclassifier-model", 'channel_input_dirs' : inputs } estimator = SKLearn(**estimator_parameters) estimator.fit(inputs) ``` but i still get the error FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/test/revenue_train.csv
1
answers
0
votes
29
views
asked a month ago

deploying previously trained model with Sagemaker Python SDK (StatusExceptionError)

I am using a pertained Random Forest Model and trying to deploy it on Amazon Sagemker using Python SDK: ``` from sagemaker.sklearn.estimator import SKLearn sklearn_estimator = SKLearn( entry_point='script.py', role = get_execution_role(), instance_count=1, instance_type='ml.m4.xlarge', framework_version='0.20.0', base_job_name='rf-scikit') sklearn_estimator.fit({'train':trainpath, 'test': testpath}, wait=False) sklearn_estimator.latest_training_job.wait(logs='None') artifact = m_boto3.describe_training_job( TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts'] print('Model artifact persisted at ' + artifact) ``` I get the following StatusException Error ``` 2022-08-25 12:03:27 Starting - Starting the training job.... 2022-08-25 12:03:52 Starting - Preparing the instances for training............ 2022-08-25 12:04:55 Downloading - Downloading input data...... 2022-08-25 12:05:31 Training - Downloading the training image......... 2022-08-25 12:06:22 Training - Training image download completed. Training in progress.. 2022-08-25 12:06:32 Uploading - Uploading generated training model. 2022-08-25 12:06:43 Failed - Training job failed --------------------------------------------------------------------------- UnexpectedStatusException Traceback (most recent call last) <ipython-input-37-628f942a78d3> in <module> ----> 1 sklearn_estimator.latest_training_job.wait(logs='None') 2 artifact = m_boto3.describe_training_job( 3 TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts'] 4 5 print('Model artifact persisted at ' + artifact) ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs) 2109 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs) 2110 else: -> 2111 self.sagemaker_session.wait_for_job(self.job_name) 2112 2113 def describe(self): ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in wait_for_job(self, job, poll) 3226 lambda last_desc: _train_done(self.sagemaker_client, job, last_desc), None, poll 3227 ) -> 3228 self._check_job_status(job, desc, "TrainingJobStatus") 3229 return desc 3230 ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name) 3390 message=message, 3391 allowed_statuses=["Completed", "Stopped"], -> 3392 actual_status=status, 3393 ) 3394 UnexpectedStatusException: Error for Training job rf-scikit-2022-08-25-12-03-25-931: Failed. Reason: AlgorithmError: framework error: Traceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 84, in train entrypoint() File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 39, in main train(environment.Environment()) File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 35, in train runner_type=runner.ProcessRunnerType) File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/entry_point.py", line 100, in run wait, capture_error File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 291, in run cwd=environment.code_dir, File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 208, in check_error info=extra_info, sagemaker_training.errors.ExecuteUserScriptError: ExecuteUserScriptError: ExitCode 1 ErrorMessage "" Command "/miniconda3/bin/python script.py" ExecuteUserScriptErr ``` The pertained model works fine and I don't know what the problem is, please help
1
answers
0
votes
19
views
asked a month ago

Sagemaker Endpoint is not created when deploying HuggingFace Model using it.

I am trying to deploy the HuggingFace model onto sagemaker. Here is the link for the model: https://huggingface.co/dalle-mini/dalle-mini I am testing in my personal account and here is the code for the same: ``` from sagemaker.huggingface import HuggingFaceModel import sagemaker sess = sagemaker.Session() # sagemaker session bucket -> used for uploading data, models and logs # sagemaker will automatically create this bucket if it not exists sagemaker_session_bucket='sagemaker-hugging-face-model-demo' if sagemaker_session_bucket == 'sagemaker-hugging-face-model-demo' and sess is not None: # set to default bucket if a bucket name is not given sagemaker_session_bucket = sess.default_bucket() role = sagemaker.get_execution_role() sess = sagemaker.Session(default_bucket=sagemaker_session_bucket) print(f"sagemaker role arn: {role}") print(f"sagemaker bucket: {sess.default_bucket()}") print(f"sagemaker session region: {sess.boto_region_name}") hub = { 'HF_MODEL_ID':'dalle-mini/dalle-mini', 'HF_TASK':'Text-to-image' } huggingface_model = HuggingFaceModel( env=hub, role=role, #image_uri="428136181372.dkr.ecr.ca-central-1.amazonaws.com/sagemaker-hugging-face", transformers_version="4.6.1", # transformers version used pytorch_version="1.7", # pytorch version used py_version='py36' ) # deploy model to Sagemaker Inference predictor = huggingface_model.deploy( initial_instance_count=1, instance_type='ml.m5.xlarge' ) ``` When I am trying to create the sagemaker endpoint I am experiencing the error: `ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Requested image 428136181372.dkr.ecr.ca-central-1.amazonaws.com/sagemaker-hugging-face not found.` Also I need to create a lambda function that will invoke the SageMaker endpoint that will send a text description for which it will return a generated image. E.g. --> The text `Sun is shining` should be transformed to image after the lambda function invokes the sagemaker endpoint. Also need to know what should be the ContentType for image.
1
answers
0
votes
42
views
profile picture
asked 2 months ago