By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Machine Images (AMI)

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Deploying a Random Forest Model on Amazon Sagemaker always getting a UnexpectedStatusException with Reason: AlgorithmError

Hey I am trying to deploy my RandomForest Classifier on Amazon Sagemaker but get a StatusException Error even though the script worked fine before: The script runs fine and prints out the confusion matrix and accuracy as expected. When I try to deploy the model to amazon Sagemaker using the script it does not work. >>! python script.py --n-estimators 100 \ --max_depth 2 \ --model-dir ./ \ --train ./ \ --test ./ \ Confusion Matrix: [[13 8] [ 1 17]] Accuracy: 0.7692307692307693 I used the Estimator from Sagemaker Python SDK >>from sagemaker.sklearn.estimator import SKLearn >>sklearn_estimator = SKLearn( entry_point='script.py', role = get_execution_role(), instance_count=1, instance_type='ml.m4.xlarge', framework_version='0.20.0', base_job_name='rf-scikit') I launched the training job as follows >>sklearn_estimator.fit({'train':trainpath, 'test': testpath}, wait=False) Here I am trying to deploy the model which leads to the StatusExceptionError that I cannot seem to fix >>sklearn_estimator.latest_training_job.wait(logs='None') >>artifact = m_boto3.describe_training_job( TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts'['S3ModelArtifacts'] >>print('Model artifact persisted at ' + artifact) >>2022-08-25 12:03:27 Starting - Starting the training job.... >>2022-08-25 12:03:52 Starting - Preparing the instances for training............ >>2022-08-25 12:04:55 Downloading - Downloading input data...... >>2022-08-25 12:05:31 Training - Downloading the training image......... >>2022-08-25 12:06:22 Training - Training image download completed. Training in progress.. >>2022-08-25 12:06:32 Uploading - Uploading generated training model. >>2022-08-25 12:06:43 Failed - Training job failed --------------------------------------------------------------------------- UnexpectedStatusException Traceback (most recent call last) <ipython-input-37-628f942a78d3> in <module> ----> 1 sklearn_estimator.latest_training_job.wait(logs='None') 2 artifact = m_boto3.describe_training_job( 3 TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts'] 4 5 print('Model artifact persisted at ' + artifact) ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs) 2109 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs) 2110 else: -> 2111 self.sagemaker_session.wait_for_job(self.job_name) 2112 2113 def describe(self): ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in wait_for_job(self, job, poll) 3226 lambda last_desc: _train_done(self.sagemaker_client, job, last_desc), None, poll 3227 ) -> 3228 self._check_job_status(job, desc, "TrainingJobStatus") 3229 return desc 3230 ~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name) 3390 message=message, 3391 allowed_statuses=["Completed", "Stopped"], -> 3392 actual_status=status, 3393 ) 3394 UnexpectedStatusException: Error for Training job rf-scikit-2022-08-25-12-03-25-931: Failed. Reason: AlgorithmError: framework error: Traceback (most recent call last): File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_trainer.py", line 84, in train entrypoint() File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 39, in main train(environment.Environment()) File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/training.py", line 35, in train runner_type=runner.ProcessRunnerType) File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/entry_point.py", line 100, in run wait, capture_error File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 291, in run cwd=environment.code_dir, File "/miniconda3/lib/python3.7/site-packages/sagemaker_training/process.py", line 208, in check_error info=extra_info, sagemaker_training.errors.ExecuteUserScriptError: ExecuteUserScriptError: ExitCode 1 ErrorMessage "" Command "/miniconda3/bin/python script.py" ExecuteUserScriptErr > I am happy for some help
0
answers
0
votes
11
views
asked a month ago