How can I run SageMaker Serverless Inference on a GPU instance?


I want to run an ML model with SageMaker Serverless Inference on a GPU instance. There is no option to select the instance type. Is it possible to run on a GPU instance?

Unfortunately GPU based inference isn't currently supported on SageMaker Serverless Inference. From the feature exclusions section of the serverless endpoints documentation:

Some of the features currently available for SageMaker Real-time Inference are not supported for Serverless Inference, including GPUs, AWS marketplace model packages, private Docker registries, Multi-Model Endpoints, VPC configuration, network isolation, data capture, multiple production variants, Model Monitor, and inference pipelines.

