Sagemaker endpoint running but constantly restarting

0

I have deployed a model to a Sagemaker endpoint using BentoML/BentoCTL. This is a tool for building APIs and containerizing models. To test, I use curl with a JSON payload to make a request. When I run the created docker container on my local machine I can successfully invoke it and get responses back. So I don't think the problem is in the docker image.

When I deploy to sagemaker, I receive the message {"message":"Service Unavailable"} as a response to my curl request. I can see the endpoint running in the Sagemaker/Endpoints dashboard. Viewing the cloudwatch logs, it appears that the the endpoint is constantly restarting. There are messages that are printed at startup (e.g. Tensorflow loading messages) that are written to the log over and over.

I thought that this might be due to using an instance type with low memory (t2.medium) so I switched to m5.4xlarge as a test, but the result is the same.

What can I do? How can I determine what's causing the endless restarts?

2 Respuestas
0

When you mean restart? Does it mean "Updating" the endpoint? Do you have an autoscaling policy attached to the endpoint? Do you see any errors in the Cloudwatch logs?

AWS
respondido hace 2 años
0
AWS
respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas