Sagemaker endpoint running but constantly restarting

0

I have deployed a model to a Sagemaker endpoint using BentoML/BentoCTL. This is a tool for building APIs and containerizing models. To test, I use curl with a JSON payload to make a request. When I run the created docker container on my local machine I can successfully invoke it and get responses back. So I don't think the problem is in the docker image.

When I deploy to sagemaker, I receive the message {"message":"Service Unavailable"} as a response to my curl request. I can see the endpoint running in the Sagemaker/Endpoints dashboard. Viewing the cloudwatch logs, it appears that the the endpoint is constantly restarting. There are messages that are printed at startup (e.g. Tensorflow loading messages) that are written to the log over and over.

I thought that this might be due to using an instance type with low memory (t2.medium) so I switched to m5.4xlarge as a test, but the result is the same.

What can I do? How can I determine what's causing the endless restarts?

2 個答案
0

When you mean restart? Does it mean "Updating" the endpoint? Do you have an autoscaling policy attached to the endpoint? Do you see any errors in the Cloudwatch logs?

AWS
已回答 2 年前
0
AWS
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南