Unable to Pass Health Checks for Deployed SageMaker Endpoint since new SageMaker version

0

Hi,

I'm encountering an issue with deploying a SageMaker endpoint that was previously working fine. I successfully deployed the Nous Hermes Llama 2 7B model to a g5.2xlarge endpoint about a week ago. It was functioning perfectly and responding to inference requests as expected. However, I deleted the endpoint for a week and now I attempted to deploy it again using the exact same configuration. Unfortunately, I'm now facing a problem where the endpoint fails to pass health checks.

I followed the same deployment steps as before, including using the same instance type and configuration settings. The only change that has occurred since the successful deployment is a recent update to the SageMaker Python library to version 2.175, which enabled the huggingface-llm 0.9.3 dlc images instead of the 0.8.2. I tried reverting to the previous version and it also did not work.

Has anyone else encountered a similar issue after a recent SageMaker library update? Are there any new considerations or configurations required for deploying the model? Is there a recommended approach to troubleshoot this issue and identify the cause?

I would greatly appreciate any advice, suggestions, or guidance you can provide. Thank you!

Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande