Yolo5 model deployment into SageMaker endpoint

0

I have a trained yolo5 model which I deployed into Real-time end-point in SageMaker. I tried almost all gpus computes available but I could not get better than 2.6 sec for inference time. This is a light model and my target is to have < 1sec inference time. Could you please help me with any hints? I transformed the model from pytorch to TF format. Thank you!

  • Probably over-simple but worth checking: What's your input request format? Are you sending in compressed images (e.g. 'image/jpeg') and extracting them in the endpoint? Or sending in raw pixel tensors (which could add significant communication overhead)

1 回答
0

Have you read the following two blog posts?

  1. Maximize TensorFlow performance on Amazon SageMaker endpoints for real-time inference
  2. Reduce computer vision inference latency using gRPC with TensorFlow serving on Amazon SageMaker?

The first explores a few parameters that you can use to maximize the performance of a TensorFlow-based SageMaker real-time endpoint. These parameters are in essence overprovisioning serving processes and adjusting their parallel processing capabilities. As we saw in the tables, this overprovisioning and adjustment leads to better utilization of resources and higher throughput, sometimes an increase as much as 1,000%.

The second demonstrates how to reduce model serving latency for TensorFlow computer vision models on SageMaker via in-server gRPC communication, leading to gains of 75% in latency improvement in the example shown.

AWS
MarkRoy
已回答 2 年前
  • Thanks @MarkRoy for the help! I tried the suggestions but nothing worked and I am not able to get the inference in less than 2.6 sec. Any other suggestions? Thank you!

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则