Unable to Pass Health Checks for Deployed SageMaker Endpoint since new SageMaker version

0

Hi,

I'm encountering an issue with deploying a SageMaker endpoint that was previously working fine. I successfully deployed the Nous Hermes Llama 2 7B model to a g5.2xlarge endpoint about a week ago. It was functioning perfectly and responding to inference requests as expected. However, I deleted the endpoint for a week and now I attempted to deploy it again using the exact same configuration. Unfortunately, I'm now facing a problem where the endpoint fails to pass health checks.

I followed the same deployment steps as before, including using the same instance type and configuration settings. The only change that has occurred since the successful deployment is a recent update to the SageMaker Python library to version 2.175, which enabled the huggingface-llm 0.9.3 dlc images instead of the 0.8.2. I tried reverting to the previous version and it also did not work.

Has anyone else encountered a similar issue after a recent SageMaker library update? Are there any new considerations or configurations required for deploying the model? Is there a recommended approach to troubleshoot this issue and identify the cause?

I would greatly appreciate any advice, suggestions, or guidance you can provide. Thank you!

Aron
gefragt vor 9 Monaten88 Aufrufe
Keine Antworten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen