How to restart Sagemaker endpoint?

0

Sometimes, when CUDA error: device-side assert triggered error occurs due to some invalid input or something, the Sagemaker endpoint stops working even for any future valid inputs. This stackoverflow answer says that restarting is the only way to solve this error. However, I can't find a way to simply restart a Sagemaker endpoint and all I have been doing is delete an existing endpoint and then create a new one. This takes a lot of time, so I was wondering if there was any way to simply restart the sagemaker endpoint machine.

질문됨 10달 전987회 조회
2개 답변
0

Hi, did you try to update your endpoint with a new config via update-endpoint CLI command: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/sagemaker/update-endpoint.html

It says:

Deploys the new EndpointConfig specified in the request, switches to using newly created 
endpoint, and then deletes resources provisioned for the endpoint using the previous 
EndpointConfig (there is no availability loss).

You may also want to automate the detection of status change of your endpoint via EventBridge and link it to a Lambda that will do the endpoint update to minimize your downtime. See https://docs.aws.amazon.com/sagemaker/latest/dg/automating-sagemaker-with-eventbridge.html#eventbridge-deployment-state

Best, Didier

profile pictureAWS
전문가
답변함 10달 전
0

Looks like you can't restart an endpoint by calling update-endpoint without specifying a different endpoint configuration, since the endpoint's current configuration is "in use".

aws sagemaker update-endpoint --endpoint-name my-endpoint --endpoint-config-name my-endpoint-config

An error occurred (ValidationException) when calling the UpdateEndpoint operation: Cannot update endpoint "my-endpoint" with the currently in use endpoint configuration "my-endpoint-config.
답변함 8달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠