- Newest
- Most votes
- Most comments
I apologize, but there seems to be a misunderstanding in your approach. While it's true that you can enable gRPC for Triton Inference Server in SageMaker by setting the SAGEMAKER_TRITON_ALLOW_GRPC environment variable to true, SageMaker itself does not currently support gRPC for endpoint invocation.
The SAGEMAKER_TRITON_ALLOW_GRPC setting is primarily intended for scenarios where you're using Triton in front of a custom reverse proxy, not for direct invocation through SageMaker endpoints.
SageMaker endpoints are designed to be invoked using HTTP/HTTPS protocols, typically through the InvokeEndpoint API or its equivalent in SDKs like boto3's invoke_endpoint() method. These methods use SageMaker's standard HTTP-based inference interface.
You cannot directly access the gRPC port (8001) of the Triton server running inside a SageMaker endpoint. SageMaker manages the networking and security for endpoints, and does not expose internal ports like this for direct access.
To invoke your SageMaker endpoint, you should continue to use the standard SageMaker invocation methods, such as boto3's invoke_endpoint(). These methods will send requests to the HTTP endpoint (usually on port 8080) that SageMaker exposes, which then forwards the request to your Triton server running inside the container.
If you need to use gRPC for your inference requests, you might need to consider alternative deployment options outside of SageMaker endpoints, such as deploying Triton on EC2 instances where you have more control over the networking configuration.
Sources
Model deployment with Triton Inference Server - Amazon SageMaker
Invoke the endpoint - Amazon SageMaker
invoke_endpoint_with_response_stream - Boto3 1.35.16 documentation
Relevant content
- Accepted Answerasked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 6 days ago
- AWS OFFICIALUpdated 14 days ago
- AWS OFFICIALUpdated a year ago