Endpoint did not pass the ping health check but in CloudWatch /ping status is 200

0

I have a fine tuned LLaMa2 70b-chat-hf artifacts which are stored on S3 as a tarball archive.

When I deploy the model on SageMaker the endpoint is moved to failed state with the following message:

The primary container for production variant <> did not pass the ping health check. Please check CloudWatch logs for this endpoint.

But in the CloudWatch I can see the app is up and running and there're bunch of successful /ping endpoint responses:

Enter image description here

When I deploy the base llama2-70b-chat-hf model there're no issues.

Can you advise how to resolve the issue?

Igor
asked 4 months ago184 views
1 Answer
0

Hello The following procedures will help you troubleshoot the endpoint health check issue even if the /ping endpoint displays a 200 status:

Perform a thorough analysis of the CloudWatch logs.

Analyze the CloudWatch logs for the endpoint in detail, keeping an eye out for any failures, warnings, or unusual activity that might be interfering with the health check success. When the health check fails, pay special attention to the logs. Verify that there are no conflicts within the container and that no resource exhaustion or dependency errors exist.

Check the Model Artifacts:

In the S3 tarball archive, make sure the adjusted model artifacts are appropriately packaged. Verify the presence of all required files and dependencies for the proper operation of the model. The model may not load correctly if any files are missing or corrupted, which could result in failed health checks.

answered 4 months ago
  • Unfortunately nothing suspicious: no errors, tracebacks or warnings, application was started, then a lot of successful /ping responses. After process hit container_startup_health_check_timeout_in_seconds limit it was terminated

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions