By using AWS re:Post, you agree to the AWS re:Post Terms of Use

How to use gRPC in Triton Inference Server using SageMaker Endpoint

0

Hello,

I am trying to serve a model using SageMaker Endpoint.

I am using Triton Inference Server as a framework,

I know that I can enable Triton's gRPC protocol communication by setting the SAGEMAKER_TRITON_ALLOW_GRPC Environment Variable as in the guide below.

I launched the Endpoint by setting SAGEMAKER_TRITON_ALLOW_GRPC = true , and I also confirmed that the gRPC setting was enabled through CloudWatch logging.

I know that invoke_endpoint() of AWS Python SDK (boto3) supports connection to the /invoke` endpoint of SageMaker Endpoint,

Is there a way to make a gRPC request to Python SDK or SageMaker Endpoint?

I think I can access it through the gRPC port (default 8001) if I know the domain of at least the SageMaker Endpoint. Is that correct?

1 Answer
1
Accepted Answer

I apologize, but there seems to be a misunderstanding in your approach. While it's true that you can enable gRPC for Triton Inference Server in SageMaker by setting the SAGEMAKER_TRITON_ALLOW_GRPC environment variable to true, SageMaker itself does not currently support gRPC for endpoint invocation.

The SAGEMAKER_TRITON_ALLOW_GRPC setting is primarily intended for scenarios where you're using Triton in front of a custom reverse proxy, not for direct invocation through SageMaker endpoints.

SageMaker endpoints are designed to be invoked using HTTP/HTTPS protocols, typically through the InvokeEndpoint API or its equivalent in SDKs like boto3's invoke_endpoint() method. These methods use SageMaker's standard HTTP-based inference interface.

You cannot directly access the gRPC port (8001) of the Triton server running inside a SageMaker endpoint. SageMaker manages the networking and security for endpoints, and does not expose internal ports like this for direct access.

To invoke your SageMaker endpoint, you should continue to use the standard SageMaker invocation methods, such as boto3's invoke_endpoint(). These methods will send requests to the HTTP endpoint (usually on port 8080) that SageMaker exposes, which then forwards the request to your Triton server running inside the container.

If you need to use gRPC for your inference requests, you might need to consider alternative deployment options outside of SageMaker endpoints, such as deploying Triton on EC2 instances where you have more control over the networking configuration.
Sources
Model deployment with Triton Inference Server - Amazon SageMaker
Invoke the endpoint - Amazon SageMaker
invoke_endpoint_with_response_stream - Boto3 1.35.16 documentation

profile picture
answered 20 days ago
AWS
EXPERT
reviewed 19 days ago
profile picture
EXPERT
reviewed 20 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions