Unable to Pass Health Checks for Deployed SageMaker Endpoint since new SageMaker version

0

Hi,

I'm encountering an issue with deploying a SageMaker endpoint that was previously working fine. I successfully deployed the Nous Hermes Llama 2 7B model to a g5.2xlarge endpoint about a week ago. It was functioning perfectly and responding to inference requests as expected. However, I deleted the endpoint for a week and now I attempted to deploy it again using the exact same configuration. Unfortunately, I'm now facing a problem where the endpoint fails to pass health checks.

I followed the same deployment steps as before, including using the same instance type and configuration settings. The only change that has occurred since the successful deployment is a recent update to the SageMaker Python library to version 2.175, which enabled the huggingface-llm 0.9.3 dlc images instead of the 0.8.2. I tried reverting to the previous version and it also did not work.

Has anyone else encountered a similar issue after a recent SageMaker library update? Are there any new considerations or configurations required for deploying the model? Is there a recommended approach to troubleshoot this issue and identify the cause?

I would greatly appreciate any advice, suggestions, or guidance you can provide. Thank you!

Aron
asked 9 months ago83 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions