Questions tagged with Machine Learning & AI

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

2
answers
0
votes
18
views
asked 2 days ago

sagemakee endpoint failing with ""An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body""

Hello, I have created sagemaker endpoint by following https://github.com/huggingface/notebooks/blob/main/sagemaker/20_automatic_speech_recognition_inference/sagemaker-notebook.ipynb and this is failing with error ""An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body"". The predict function returning me following error but CW log does not have any error details for the endpoint. ``` ModelError Traceback (most recent call last) /tmp/ipykernel_16248/2846183179.py in 2 # audio_path = "s3://ml-backend-sales-call-audio/sales-call-audio/1279881599154831602.playback.mp3" 3 audio_path = "/home/ec2-user/SageMaker/finetune-deploy-bert-with-amazon-sagemaker-for-hugging-face/1279881599154831602.playback.mp3" ## AS OF NOW have stored locally in notebook instance ----> 4 res = predictor.predict(data=audio_path) 5 print(res) ~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id) 159 data, initial_args, target_model, target_variant, inference_id 160 ) --> 161 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args) 162 return self._handle_response(response) 163 ~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 493 ) 494 # The "self" in this scope is referring to the BaseClient. --> 495 return self._make_api_call(operation_name, kwargs) 496 497 _api_call.name = str(py_operation_name) ~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 912 error_code = parsed_response.get("Error", {}).get("Code") 913 error_class = self.exceptions.from_code(error_code) --> 914 raise error_class(parsed_response, operation_name) 915 else: 916 return parsed_response ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body. See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/asr-facebook-wav2vec2-base-960h-2022-11-25-19-27-19 in account xxxx for more information. ` ```
1
answers
0
votes
54
views
asked 3 days ago

Sagemaker Pipelines - Is it possible to use a TransformStep with the Catboost Estimator ?

Hi! I am trying to implement a Sagemaker Pipeline including the following steps (among other things): * **ProcessingStep**: processing script (PySparkProcessor) generating a train , validation and test dataset (csv) * **TrainingStep**: model training, CatBoost Estimator (https://docs.aws.amazon.com/sagemaker/latest/dg/catboost.html) * **TransformStep**: batch inference using the model on the test dataset (csv) The TransformStep returns the following error: **python3: can't open file 'serve': [Errno 2] No such file or directory** I wonder if I'm using TransformStep in the wrong way or if, at the moment, the use of TransformStep with the CatBoost model has not been implemented yet. Code: ``` [...] pyspark_processor = PySparkProcessor( base_job_name="sm-spark", framework_version="3.1", role=role_arn, instance_type="ml.m5.xlarge", instance_count=12, sagemaker_session=pipeline_session, max_runtime_in_seconds=2400, ) step_process_args = pyspark_processor.run( submit_app=os.path.join( s3_preprocess_script_dir, "preprocess.py" ), # Hack to fix cache hit submit_py_files=[os.path.join( s3_preprocess_script_dir, "preprocess_utils.py" ), os.path.join( s3_preprocess_script_dir, "spark_utils.py" )], outputs=[ ProcessingOutput( output_name="datasets", source="/opt/ml/processing/output", destination=s3_preprocess_output_path, ) ], arguments=["--aws_account", AWS_ACCOUNT, "--aws_env", AWS_ENV, "--project_name", PROJECT_NAME, "--mode", "training"], ) step_process = ProcessingStep( name="PySparkPreprocessing", step_args=step_process_args, cache_config=cache_config, ) train_model_id = "catboost-classification-model" train_model_version = "*" train_scope = "training" training_instance_type = "ml.m5.xlarge" # Retrieve the docker image train_image_uri = image_uris.retrieve( region=None, framework=None, model_id=train_model_id, model_version=train_model_version, image_scope=train_scope, instance_type=training_instance_type, ) # Retrieve the training script train_source_uri = script_uris.retrieve( model_id=train_model_id, model_version=train_model_version, script_scope=train_scope ) # Retrieve the pre-trained model tarball to further fine-tune train_model_uri = model_uris.retrieve( model_id=train_model_id, model_version=train_model_version, model_scope=train_scope ) training_job_name = name_from_base(f"jumpstart-{train_model_id}-training") # Create SageMaker Estimator instance tabular_estimator = Estimator( role=role_arn, image_uri=train_image_uri, source_dir=train_source_uri, model_uri=train_model_uri, entry_point="transfer_learning.py", instance_count=1, instance_type="ml.m5.xlarge", max_run=360000, hyperparameters=hyperparameters, sagemaker_session=pipeline_session, output_path=s3_training_output_path, disable_profiler=True, # The default profiler rule includes a timestamp which will change each time the pipeline is upserted, causing cache misses. If profiling is not needed, set disable_profiler to True on the estimator. ) # Launch a SageMaker Training job by passing s3 path of the training data step_train_args = tabular_estimator.fit( { "training": TrainingInput( s3_data=step_process.properties.ProcessingOutputConfig.Outputs[ "datasets" ].S3Output.S3Uri ) }, logs=True, job_name=training_job_name, ) step_train = TrainingStep( name="CatBoostTraining", step_args=step_train_args, cache_config=cache_config, ) script_eval = ScriptProcessor( image_uri=[MASKED], command=["python3"], instance_type="ml.m5.xlarge", instance_count=1, base_job_name="script-evaluation", role=role_arn, sagemaker_session=pipeline_session, ) eval_args = script_eval.run( inputs=[ ProcessingInput( source=step_train.properties.ModelArtifacts.S3ModelArtifacts, destination="/opt/ml/processing/model", ), ProcessingInput( source=step_process.properties.ProcessingOutputConfig.Outputs[ "datasets" ].S3Output.S3Uri, destination="/opt/ml/processing/input", ), ], outputs=[ ProcessingOutput( output_name="evaluation", source="/opt/ml/processing/evaluation", destination=s3_evaluation_output_path, ), ], code="common/evaluation.py", ) evaluation_report = PropertyFile( name="EvaluationReport", output_name="evaluation", path="evaluation.json" ) step_eval = ProcessingStep( name="Evaluation", step_args=eval_args, property_files=[evaluation_report], cache_config=cache_config, ) model = Model( image_uri="467855596088.dkr.ecr.eu-west-3.amazonaws.com/sagemaker-catboost-image:latest", model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts, sagemaker_session=pipeline_session, role=role_arn, ) evaluation_s3_uri = "{}/evaluation.json".format( step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"] ) model_step_args = model.create( instance_type="ml.m5.large", ) create_model = ModelStep(name="CatBoostModel", step_args=model_step_args) step_fail = FailStep( name="FailBranch", error_message=Join( on=" ", values=["Execution failed due to F1-score <", 0.8] ), ) cond_lte = ConditionGreaterThanOrEqualTo( left=JsonGet( step_name=step_eval.name, property_file=evaluation_report, json_path="classification_metrics.f1-score.value", ), right=f1_threshold, ) step_cond = ConditionStep( name="F1ScoreCondition", conditions=[cond_lte], if_steps=[create_model], else_steps=[step_fail], ) # Transform Job s3_test_transform_input = os.path.join(step_process.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"], "test") transformer = Transformer(model_name=create_model.properties.ModelName, instance_count=1, instance_type="ml.m5.xlarge", assemble_with="Line", accept="text/csv", output_path=s3_test_transform_output_path, sagemaker_session=pipeline_session) transform_step_args = transformer.transform( data=s3_test_transform_input, content_type="text/csv", split_type="Line", ) step_transform = TransformStep( name="InferenceTransform", step_args=transform_step_args, ) # Create and execute pipeline step_transform.add_depends_on([step_process, create_model]) pipeline = Pipeline( name=pipeline_name, steps=[step_process, step_train, step_eval, step_cond, step_transform], sagemaker_session=pipeline_session, ) pipeline.upsert(role_arn=role_arn, description=[MASKED]) execution = pipeline.start() execution.wait(delay=60, max_attempts=120) ```
2
answers
0
votes
42
views
HaPo
asked 12 days ago
1
answers
0
votes
41
views
asked 13 days ago