How to restart Sagemaker endpoint?

0

Sometimes, when CUDA error: device-side assert triggered error occurs due to some invalid input or something, the Sagemaker endpoint stops working even for any future valid inputs. This stackoverflow answer says that restarting is the only way to solve this error. However, I can't find a way to simply restart a Sagemaker endpoint and all I have been doing is delete an existing endpoint and then create a new one. This takes a lot of time, so I was wondering if there was any way to simply restart the sagemaker endpoint machine.

已提问 10 个月前987 查看次数
2 回答
0

Hi, did you try to update your endpoint with a new config via update-endpoint CLI command: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/update-endpoint.html

It says:

Deploys the new EndpointConfig specified in the request, switches to using newly created 
endpoint, and then deletes resources provisioned for the endpoint using the previous 
EndpointConfig (there is no availability loss).

You may also want to automate the detection of status change of your endpoint via EventBridge and link it to a Lambda that will do the endpoint update to minimize your downtime. See https://docs.aws.amazon.com/sagemaker/latest/dg/automating-sagemaker-with-eventbridge.html#eventbridge-deployment-state

Best, Didier

profile pictureAWS
专家
已回答 10 个月前
0

Looks like you can't restart an endpoint by calling update-endpoint without specifying a different endpoint configuration, since the endpoint's current configuration is "in use".

aws sagemaker update-endpoint --endpoint-name my-endpoint --endpoint-config-name my-endpoint-config

An error occurred (ValidationException) when calling the UpdateEndpoint operation: Cannot update endpoint "my-endpoint" with the currently in use endpoint configuration "my-endpoint-config.
已回答 8 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容