Creating Sagemaker Endpoint for 2 models (Segment Anything & YOLOv8) and Invoking it

0

We have created a SageMaker Endpoint to deploy 2 PyTorch models and invoke them. Endpoint is created successfully and it is Real-Time. We receive error when we invoke this endpoint. The errors include Backend Worker died or Backend worker error etc. We are using "ml.g4dn.2xlarge" instance alongwith following parameters:

framework_version="2.0.1" py_version="py310"

Some notable errors after running multiple times in our CloudWatch are:

2024-01-06T11:57:59,036 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/ts/model_service_worker.py", line 184, in handle_connection

2024-01-06T11:57:59,036 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.

2024-01-06T11:58:00,960 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error 2024-01-06T11:58:03,486 [ERROR] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - Unknown exception

2024-01-06T13:46:02,364 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error

We have set many logs in our inference.py file but seems like the Invoking process stops even before running the Inference file as Backend worker dies. The 2 models we are using are:

  1. sam_vit_l_0b3195.pth (Segment Anything model)
  2. yolov8n.pt
asked 4 months ago573 views
1 Answer
0

Hello,

I understand that you are concerned about creating Sagemaker Endpoint for 2 models (Segment Anything & YOLOv8) and Invoking it and would like to gather more information on the same.

Firstly, I would like to mention that this error is usually observed when the worker (instance) called to process the inference request did not respond within the given time, which is 60 seconds for real-time endpoints. This led to the worker being overwhelmed and ultimately dying.

Further, in order to resolve the issue, kindly follow one of the below workarounds -

  1. It is recommended to either increase the instance type so that it is able to handle the load and process it within 60 mins.
  2. Reduce the Payload Size limit.

Kindly refer the below doc for more information on the above- [+] https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html

Additionally, kindly make a note that the error means that the customer container returned an error. SageMaker does not control the behavior of customer containers. SageMaker simply returns the response from the ModelContainer and does not retry. If you want, you can configure the invocation to retry on failure. We suggest that you turn on container logging and check your container logs to find the root cause of the 500 error from your model.

I would request that you please refer to the aforementioned documentation once, and please reach out to AWS [4] with the detailed use case so that we can assist you better.

If you have any difficulty verifying any of the above-mentioned points or if you still run into issues, please reach out to AWS Support [4] (Sagemaker) along with your issue or use case in detail, and we would be happy to assist you further.

References:


[1] https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html

[2]https://github.com/pytube/pytube/issues/815

[3] https://aws.amazon.com/premiumsupport/

[4] Creating support cases and case management - https://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-casehttps://docs.aws.amazon.com/awssupport/latest/user/case-management.html#creating-a-support-case

[5] https://aws.amazon.com/premiumsupport/faqs/

AWS
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions