Yolo5 model deployment into SageMaker endpoint

0

I have a trained yolo5 model which I deployed into Real-time end-point in SageMaker. I tried almost all gpus computes available but I could not get better than 2.6 sec for inference time. This is a light model and my target is to have < 1sec inference time. Could you please help me with any hints? I transformed the model from pytorch to TF format. Thank you!

  • Probably over-simple but worth checking: What's your input request format? Are you sending in compressed images (e.g. 'image/jpeg') and extracting them in the endpoint? Or sending in raw pixel tensors (which could add significant communication overhead)

1개 답변
0

Have you read the following two blog posts?

  1. Maximize TensorFlow performance on Amazon SageMaker endpoints for real-time inference
  2. Reduce computer vision inference latency using gRPC with TensorFlow serving on Amazon SageMaker?

The first explores a few parameters that you can use to maximize the performance of a TensorFlow-based SageMaker real-time endpoint. These parameters are in essence overprovisioning serving processes and adjusting their parallel processing capabilities. As we saw in the tables, this overprovisioning and adjustment leads to better utilization of resources and higher throughput, sometimes an increase as much as 1,000%.

The second demonstrates how to reduce model serving latency for TensorFlow computer vision models on SageMaker via in-server gRPC communication, leading to gains of 75% in latency improvement in the example shown.

AWS
답변함 2년 전
  • Thanks @MarkRoy for the help! I tried the suggestions but nothing worked and I am not able to get the inference in less than 2.6 sec. Any other suggestions? Thank you!

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인