By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon SageMaker

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

How to get batch transform with jsonl data?

I am using my own inference.py file as a entry point for inference. I have tested this pytorch model, served as a real time endpoint in amaon sagemaker. but when i try to create a batch job and use multiple json object in my input file (jsonl format) . i get the following error at the input_fn function on this line data = json.loads(request_body), in cloudwatch logs ==> data = json.loads(request_body) raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data : line 2 column 1 (Char ..) I am not sure why am i getting extra data on line 2 error, because this is supposed to be batch job with multiple json input and each line. inference.py ``` def model_fn(model_dir): //load the model def input_fn(request_body, request_content_type): input_data= json.loads(request_body) return data def predict_fn(input_data, model): return model.predict(input_data) ``` set up batch job ``` response = client.create_transform_job( TransformJobName='some-job', ModelName='mypytorchmodel', ModelClientConfig={ 'InvocationsTimeoutInSeconds': 3600, 'InvocationsMaxRetries': 1 }, BatchStrategy='MultiRecord', TransformInput={ 'DataSource': { 'S3DataSource': { 'S3DataType': 'S3Prefix', 'S3Uri': 's3://inputpath' } }, 'ContentType': 'application/json', 'SplitType': 'Line' }, TransformOutput={ 'S3OutputPath': 's3://outputpath', 'Accept': 'application/json', 'AssembleWith': 'Line', }, TransformResources={ 'InstanceType': 'ml.g4dn.xlarge' 'InstanceCount': 1 } ) ``` input file ``` {"input" : "some text here"} {"input" : "another"} ...
1
answers
0
votes
23
views
asked a month ago

No such file or directory: '/opt/ml/input/data/test/revenue_train.csv' Sagemaker [SM_CHANNEL_TRAIN]

I am trying to deploy my RandomForestClassifier on Amazon Sagemaker using Python SDK. I have been following this example https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-script-mode/sagemaker-script-mode.ipynb but keep getting an error that the train file was not found. I think the file were not uploaded to the correct channel. When I run the script as follows it works fine. ``` ! python script_rf.py --model-dir ./ \ --train ./ \ --test ./ \ ``` This is my script code: ``` # inference functions --------------- def model_fn(model_dir): clf = joblib.load(os.path.join(model_dir, "model.joblib")) return clf if __name__ =='__main__': print('extracting arguments') parser = argparse.ArgumentParser() # hyperparameters sent by the client are passed as command-line arguments to the script. parser.add_argument('--max_depth', type=int, default=2) parser.add_argument('--n_estimators', type=int, default=100) parser.add_argument('--random_state', type=int, default=0) # Data, model, and output directories parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR')) parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TEST')) parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST')) parser.add_argument('--train-file', type=str, default='revenue_train.csv') parser.add_argument('--test-file', type=str, default='revenue_test.csv') args, _ = parser.parse_known_args() print('reading data') train_df = pd.read_csv(os.path.join(args.train, args.train_file)) test_df = pd.read_csv(os.path.join(args.test, args.test_file)) if len(train_df) == 0: raise ValueError(('There are no files in {}.\n').format(args.train, "train")) print('building training and testing datasets') attributes = ['available_minutes_100','ampido_slots_amount','ampido_slots_amount_100','ampido_slots_amount_200','ampido_slots_amount_300','min_dist_loc','count_event','min_dist_phouses','count_phouses','min_dist_stops','count_stops','min_dist_tickets','count_tickets','min_dist_google','min_dist_psa','count_psa'] X_train = train_df[attributes] X_test = test_df[attributes] y_train = train_df['target'] y_test = test_df['target'] # train print('training model') model = RandomForestClassifier( max_depth =args.max_depth, n_estimators = args.n_estimators) model.fit(X_train, y_train) # persist model path = os.path.join(args.model_dir, "model_rf.joblib") joblib.dump(model, path) print('model persisted at ' + path) # print accuracy and confusion matrix print('validating model') y_pred=model.predict(X_test) print('Confusion Matrix:') result = confusion_matrix(y_test, y_pred) print(result) print('Accuracy:') result2 = accuracy_score(y_test, y_pred) print(result2) ``` the error is raised in the train_df line of the script (FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/test/revenue_train.csv'). I tried specifying the input parameters: ``` # change channel input dirs inputs = { "train": "ampido-exports/production/revenue_train", "test": "ampido-exports/production/revenue_test", } from sagemaker.sklearn.estimator import SKLearn enable_local_mode_training = False hyperparameters = {"max_depth": 2, 'random_state':0, "n_estimators": 100} if enable_local_mode_training: train_instance_type = "local" inputs = {"train": trainpath, "test": testpath} else: train_instance_type = "ml.c5.xlarge" inputs = {"train": trainpath, "test": testpath} estimator_parameters = { "entry_point": "script_rf.py", "framework_version": "1.0-1", "py_version": "py3", "instance_type": train_instance_type, "instance_count": 1, "hyperparameters": hyperparameters, "role": role, "base_job_name": "randomforestclassifier-model", 'channel_input_dirs' : inputs } estimator = SKLearn(**estimator_parameters) estimator.fit(inputs) ``` but i still get the error FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/test/revenue_train.csv
1
answers
0
votes
39
views
asked a month ago

deploying previously trained model with Sagemaker Python SDK (StatusExceptionError)

I am using a pertained Random Forest Model and trying to deploy it on Amazon Sagemker using Python SDK: ``` from sagemaker.sklearn.estimator import SKLearn sklearn_estimator = SKLearn( entry_point='script.py', role = get_execution_role(), instance_count=1, instance_type='ml.m4.xlarge', framework_version='0.20.0', base_job_name='rf-scikit') sklearn_estimator.fit({'train':trainpath, 'test': testpath}, wait=False) sklearn_estimator.latest_training_job.wait(logs='None') artifact = m_boto3.describe_training_job( TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts'] print('Model artifact persisted at ' + artifact) ``` I get the following StatusException Error ``` 2022-08-25 12:03:27 Starting - Starting the training job.... 2022-08-25 12:03:52 Starting - Preparing the instances for training............ 2022-08-25 12:04:55 Downloading - Downloading input data...... 2022-08-25 12:05:31 Training - Downloading the training image......... 2022-08-25 12:06:22 Training - Training image download completed. Training in progress.. 2022-08-25 12:06:32 Uploading - Uploading generated training model. 2022-08-25 12:06:43 Failed - Training job failed --------------------------------------------------------------------------- UnexpectedStatusException Traceback (most recent call last) <ipython-input-37-628f942a78d3> in <module> ----> 1 sklearn_estimator.latest_training_job.wait(logs='None') 2 artifact = m_boto3.describe_training_job( 3 TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts'] 4 5 print('Model artifact persisted at ' + artifact) ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs) 2109 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs) 2110 else: -> 2111 self.sagemaker_session.wait_for_job(self.job_name) 2112 2113 def describe(self): ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in wait_for_job(self, job, poll) 3226 lambda last_desc: _train_done(self.sagemaker_client, job, last_desc), None, poll 3227 ) -> 3228 self._check_job_status(job, desc, "TrainingJobStatus") 3229 return desc 3230 ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name) 3390 message=message, 3391 allowed_statuses=["Completed", "Stopped"], -> 3392 actual_status=status, 3393 ) 3394 UnexpectedStatusException: Error for Training job rf-scikit-2022-08-25-12-03-25-931: Failed. Reason: AlgorithmError: framework error: Traceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 84, in train entrypoint() File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 39, in main train(environment.Environment()) File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 35, in train runner_type=runner.ProcessRunnerType) File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/entry_point.py", line 100, in run wait, capture_error File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 291, in run cwd=environment.code_dir, File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 208, in check_error info=extra_info, sagemaker_training.errors.ExecuteUserScriptError: ExecuteUserScriptError: ExitCode 1 ErrorMessage "" Command "/miniconda3/bin/python script.py" ExecuteUserScriptErr ``` The pertained model works fine and I don't know what the problem is, please help
1
answers
0
votes
21
views
asked a month ago