How to restart Sagemaker endpoint?

0

Sometimes, when CUDA error: device-side assert triggered error occurs due to some invalid input or something, the Sagemaker endpoint stops working even for any future valid inputs. This stackoverflow answer says that restarting is the only way to solve this error. However, I can't find a way to simply restart a Sagemaker endpoint and all I have been doing is delete an existing endpoint and then create a new one. This takes a lot of time, so I was wondering if there was any way to simply restart the sagemaker endpoint machine.

已提問 10 個月前檢視次數 987 次
2 個答案
0

Hi, did you try to update your endpoint with a new config via update-endpoint CLI command: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/update-endpoint.html

It says:

Deploys the new EndpointConfig specified in the request, switches to using newly created 
endpoint, and then deletes resources provisioned for the endpoint using the previous 
EndpointConfig (there is no availability loss).

You may also want to automate the detection of status change of your endpoint via EventBridge and link it to a Lambda that will do the endpoint update to minimize your downtime. See https://docs.aws.amazon.com/sagemaker/latest/dg/automating-sagemaker-with-eventbridge.html#eventbridge-deployment-state

Best, Didier

profile pictureAWS
專家
已回答 10 個月前
0

Looks like you can't restart an endpoint by calling update-endpoint without specifying a different endpoint configuration, since the endpoint's current configuration is "in use".

aws sagemaker update-endpoint --endpoint-name my-endpoint --endpoint-config-name my-endpoint-config

An error occurred (ValidationException) when calling the UpdateEndpoint operation: Cannot update endpoint "my-endpoint" with the currently in use endpoint configuration "my-endpoint-config.
已回答 8 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南