- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Have you read the following two blog posts?
- Maximize TensorFlow performance on Amazon SageMaker endpoints for real-time inference
- Reduce computer vision inference latency using gRPC with TensorFlow serving on Amazon SageMaker?
The first explores a few parameters that you can use to maximize the performance of a TensorFlow-based SageMaker real-time endpoint. These parameters are in essence overprovisioning serving processes and adjusting their parallel processing capabilities. As we saw in the tables, this overprovisioning and adjustment leads to better utilization of resources and higher throughput, sometimes an increase as much as 1,000%.
The second demonstrates how to reduce model serving latency for TensorFlow computer vision models on SageMaker via in-server gRPC communication, leading to gains of 75% in latency improvement in the example shown.
Thanks @MarkRoy for the help! I tried the suggestions but nothing worked and I am not able to get the inference in less than 2.6 sec. Any other suggestions? Thank you!
Contenuto pertinente
- AWS UFFICIALEAggiornata 2 anni fa
- Perché il mio endpoint Amazon SageMaker entra in stato di errore quando creo o aggiorno un endpoint?AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
Probably over-simple but worth checking: What's your input request format? Are you sending in compressed images (e.g. 'image/jpeg') and extracting them in the endpoint? Or sending in raw pixel tensors (which could add significant communication overhead)