Deploying a Random Forest Model on Amazon Sagemaker always getting a UnexpectedStatusException with Reason: AlgorithmError
Hey I am trying to deploy my RandomForest Classifier on Amazon Sagemaker but get a StatusException Error even though the script worked fine before:
The script runs fine and prints out the confusion matrix and accuracy as expected. When I try to deploy the model to amazon Sagemaker using the script it does not work.
! python script.py --n-estimators 100
--max_depth 2
--model-dir ./
--train ./
--test ./ \
Confusion Matrix: [[13 8] [ 1 17]] Accuracy: 0.7692307692307693
I used the Estimator from Sagemaker Python SDK
from sagemaker.sklearn.estimator import SKLearn sklearn_estimator = SKLearn( entry_point='script.py', role = get_execution_role(), instance_count=1, instance_type='ml.m4.xlarge', framework_version='0.20.0', base_job_name='rf-scikit')
I launched the training job as follows
sklearn_estimator.fit({'train':trainpath, 'test': testpath}, wait=False)
Here I am trying to deploy the model which leads to the StatusExceptionError that I cannot seem to fix
sklearn_estimator.latest_training_job.wait(logs='None') artifact = m_boto3.describe_training_job( TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts'['S3ModelArtifacts']
print('Model artifact persisted at ' + artifact)
2022-08-25 12:03:27 Starting - Starting the training job.... 2022-08-25 12:03:52 Starting - Preparing the instances for training............ 2022-08-25 12:04:55 Downloading - Downloading input data...... 2022-08-25 12:05:31 Training - Downloading the training image......... 2022-08-25 12:06:22 Training - Training image download completed. Training in progress.. 2022-08-25 12:06:32 Uploading - Uploading generated training model. 2022-08-25 12:06:43 Failed - Training job failed
UnexpectedStatusException Traceback (most recent call last) <ipython-input-37-628f942a78d3> in <module> ----> 1 sklearn_estimator.latest_training_job.wait(logs='None') 2 artifact = m_boto3.describe_training_job( 3 TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts'] 4 5 print('Model artifact persisted at ' + artifact)
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs) 2109 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs) 2110 else: -> 2111 self.sagemaker_session.wait_for_job(self.job_name) 2112 2113 def describe(self):
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in wait_for_job(self, job, poll) 3226 lambda last_desc: _train_done(self.sagemaker_client, job, last_desc), None, poll 3227 ) -> 3228 self._check_job_status(job, desc, "TrainingJobStatus") 3229 return desc 3230
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name) 3390 message=message, 3391 allowed_statuses=["Completed", "Stopped"], -> 3392 actual_status=status, 3393 ) 3394
UnexpectedStatusException: Error for Training job rf-scikit-2022-08-25-12-03-25-931: Failed. Reason: AlgorithmError: framework error: Traceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 84, in train entrypoint() File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 39, in main train(environment.Environment()) File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 35, in train runner_type=runner.ProcessRunnerType) File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/entry_point.py", line 100, in run wait, capture_error File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 291, in run cwd=environment.code_dir, File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 208, in check_error info=extra_info, sagemaker_training.errors.ExecuteUserScriptError: ExecuteUserScriptError: ExitCode 1 ErrorMessage "" Command "/miniconda3/bin/python script.py"
ExecuteUserScriptErr
I am happy for some help
- 最新
- 投票最多
- 评论最多
相关内容
- AWS 官方已更新 2 年前
- AWS 官方已更新 8 个月前
- AWS 官方已更新 1 年前
- AWS 官方已更新 2 年前