- Le plus récent
- Le plus de votes
- La plupart des commentaires
For real-time inference endpoint, InvokeEndpoint:the model container must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to invocations. You could check the model latency using CloudWatch metrics.
Hello Lamia Mohamed
Real-time inference is ideal for online inferences that have low latency or high throughput requirements. Use real-time inference for a persistent and fully managed endpoint (REST API) that can handle sustained traffic, backed by the instance type of your choice. Real-time inference can support payload sizes up to 6 MB and processing times of 60 seconds.
This is also given in SageMaker Docs
I can give a few suggestions based on this error. Check the Benchmarking on the Model, Test the Model and Model container Locally.
If the Model container produces inference within the 60 s timeout then we are good to go for SageMaker.
ModelLatency is helpful because Sagemaker requires the container to respond within 60 seconds [1]: if you see the ModelLatency at or above 60 seconds that confirms the container isn't responding fast enough. At that point, you'll need to figure out why your container isn't running quickly enough. If it is a SageMaker owned Model I would suggest looking into the inference Logic and contacting AWS Support
Contenus pertinents
- demandé il y a 6 mois
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a 3 ans