SageMaker Text data ML.P3.2Xlarge Error

0

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2023-08-10-12-34-42-075 in account 962041679118 for more information.

rahul
asked 8 months ago280 views
3 Answers
1

When you invoke an endpoint, the model containers must respond to requests within 60 seconds [1]. I think it is expected the model to occasionally take longer than 60 seconds with your current configuration, using a larger instance type and/or a different instance class (standard/compute/memory/accelerated) with the aim to bring the response to less than 60 seconds, may be the resolution to this problem. Please try again with a different instance type in your endpoint configuration.

To know what would fit, you may need to figure out the family type that fits your needs ; more GPU , more CPU or RAM.

[1] InvokeEndpoint https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html

AWS
answered 8 months ago
1

Hi,

Look at this similar issue: https://discuss.huggingface.co/t/invokeendpoint-error-predict-function-invocation-timeout/34755

The solution is in this case was to change instance type to a more powerful one: can you try with a bigger one than ML.P3.2Xlarge ?

Update:

To better understand all possible choices: see https://pages.awscloud.com/rs/112-TZM-766/images/AL-ML%20for%20Startups%20-%20Select%20the%20Right%20ML%20Instance.pdf

This page gives you the full list to choose from : https://docs.aws.amazon.com/de_de/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-notebookinstance.html#cfn-sagemaker-notebookinstance-instancetype

Allowed values: ml.c4.2xlarge | ml.c4.4xlarge | ml.c4.8xlarge | ml.c4.xlarge 
| ml.c5.18xlarge | ml.c5.2xlarge | ml.c5.4xlarge | ml.c5.9xlarge | ml.c5.xlarge 
| ml.c5d.18xlarge | ml.c5d.2xlarge | ml.c5d.4xlarge | ml.c5d.9xlarge | ml.c5d.xlarge
 | ml.g4dn.12xlarge | ml.g4dn.16xlarge | ml.g4dn.2xlarge | ml.g4dn.4xlarge 
| ml.g4dn.8xlarge | ml.g4dn.xlarge | ml.g5.12xlarge | ml.g5.16xlarge
| ml.g5.24xlarge | ml.g5.2xlarge | ml.g5.48xlarge
 | ml.g5.4xlarge | ml.g5.8xlarge | ml.g5.xlarge | ml.inf1.24xlarge | ml.inf1.2xlarge
 | ml.inf1.6xlarge | ml.inf1.xlarge | ml.m4.10xlarge | ml.m4.16xlarge | ml.m4.2xlarge 
| ml.m4.4xlarge | ml.m4.xlarge | ml.m5.12xlarge | ml.m5.24xlarge | ml.m5.2xlarge
 | ml.m5.4xlarge | ml.m5.xlarge | ml.m5d.12xlarge | ml.m5d.16xlarge | ml.m5d.24xlarge 
| ml.m5d.2xlarge | ml.m5d.4xlarge | ml.m5d.8xlarge | ml.m5d.large | ml.m5d.xlarge 
| ml.p2.16xlarge | ml.p2.8xlarge | ml.p2.xlarge | ml.p3.16xlarge | ml.p3.2xlarge 
| ml.p3.8xlarge | ml.p3dn.24xlarge | ml.p4d.24xlarge | ml.p4de.24xlarge | ml.r5.12xlarge
| ml.r5.16xlarge | ml.r5.24xlarge | ml.r5.2xlarge | ml.r5.4xlarge | ml.r5.8xlarge | ml.r5.large 
| ml.r5.xlarge | ml.t2.2xlarge | ml.t2.large | ml.t2.medium | ml.t2.xlarge | ml.t3.2xlarge 
| ml.t3.large | ml.t3.medium | ml.t3.xlarge

So, I'd suggest to try to replace your current ml.p3.2xlarge with ml.p3.8xlarge to see if it fixes it

Best,

Didier

profile pictureAWS
EXPERT
answered 8 months ago
0

Dear Sir,

I have a text data consisting of only 42 lines. Despite using multiple instances, the same error continues to be shown. Could you please suggest which instance I should use now?

estimator = PyTorch( entry_point="dummy_train.py", source_dir=local_source_dir, role=role_arn, instance_count=1, instance_type="ml.p3.2xlarge",
framework_version=framework_version, py_version=py_version, hyperparameters=hyperparameters )

rahul
answered 8 months ago
  • Hi, I updated my initial answer: see my proposal and let us know if it goes better

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions