sagemakee endpoint failing with ""An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body""

0

Hello, I have created sagemaker endpoint by following https://github.com/huggingface/notebooks/blob/main/sagemaker/20_automatic_speech_recognition_inference/sagemaker-notebook.ipynb and this is failing with error ""An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body"".

The predict function returning me following error but CW log does not have any error details for the endpoint.

ModelError Traceback (most recent call last)
/tmp/ipykernel_16248/2846183179.py in
2 # audio_path = "s3://ml-backend-sales-call-audio/sales-call-audio/1279881599154831602.playback.mp3"
3 audio_path = "/home/ec2-user/SageMaker/finetune-deploy-bert-with-amazon-sagemaker-for-hugging-face/1279881599154831602.playback.mp3" ## AS OF NOW have stored locally in notebook instance
----> 4 res = predictor.predict(data=audio_path)
5 print(res)

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant, inference_id)
159 data, initial_args, target_model, target_variant, inference_id
160 )
--> 161 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
162 return self._handle_response(response)
163

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
493 )
494 # The "self" in this scope is referring to the BaseClient.
--> 495 return self._make_api_call(operation_name, kwargs)
496
497 _api_call.name = str(py_operation_name)

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
912 error_code = parsed_response.get("Error", {}).get("Code")
913 error_class = self.exceptions.from_code(error_code)
--> 914 raise error_class(parsed_response, operation_name)
915 else:
916 return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body. See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/asr-facebook-wav2vec2-base-960h-2022-11-25-19-27-19 in account xxxx for more information.

`

asked a year ago2530 views
1 Answer
0

Hello AmitKayal,

I understand that you have successfully created an Endpoint. However, when you try to invoke this Endpoint you get the following error:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body. See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/asr-facebook-wav2vec2-base-960h-2022-11-25-19-27-19 in account xxxx for more information.

And when you looked at your CloudWatch logs there was nothing related to this error. Let me know if I have misunderstood anything.

The ClientError 413 usually occurs when the payload size for the endpoint invocation exceeds the limit of 6 MB [1,2], this could be the reason why your Endpoint is throwing the error 413.

If your payload size is more than 6MB you can work around this by using either Batch Transform [3] or Asynchronous Inference [4]. Batch Transform can be used if you would like to process the request to your model in batches. With Batch Transform you have an option to define your own maximum payload size [5]. Otherwise, if you want to receive inference for each request to your model you can use Asynchronous Inference which takes up to 1 GB of payload size with the runtime of about 15 minutes [6]. Asynchronous Inference queues requests to your model and processes them asynchronously, this is ideal for payloads that are greater than 6MB but not more than 1GB.

Should the suggested work around not work, I recommend that you open a Support case with AWS Technical Support [7]. The Technical Support team will be able to help you to further troubleshoot this issue.

I trust this information finds you well. Should you have any further questions, please feel free to reach out.

References:

  1. https://github.com/aws/amazon-sagemaker-examples/issues/245
  2. https://docs.aws.amazon.com/general/latest/gr/sagemaker.html#limits_sagemaker
  3. https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-batch.html
  4. https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html
  5. https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html
  6. https://docs.aws.amazon.com/sagemaker/latest/dg/hosting-faqs.html#hosting-faqs-general
  7. https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/create
AWS
answered a year ago
  • Thanks a lot for your comments. I have tried now within 6MB and this is working. But batch transform is failing with following error. Can you please guide how I can fix this? ''' inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate) File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/audio_utils.py", line 39, in ffmpeg_read raise ValueError("Malformed soundfile") ''' I believe the packge installed as part of container instantiated as part of transform is lacking this version? I have added my requirements.txt into model package as below...The package has code subfolder where my requirements.txt file also available. But looks like the container is lacking soundfile package which is there in my requirements.txt file

    '''

    Hub Model configuration.

    hub = { 'HF_MODEL_ID':'openai/whisper-tiny', 'HF_TASK':'automatic-speech-recognition' }

    create Hugging Face Model Class

    huggingface_model = HuggingFaceModel( env=hub, # configuration for loading model from Hub role=role, # iam role with permissions to create an Endpoint model_data = s3_location, transformers_version="4.17.0", # transformers version used pytorch_version="1.10.2", # pytorch version used py_version='py38', # python version used )

    '''

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions

Relevant content