- Newest
- Most votes
- Most comments
Yes, Amazon SageMaker real-time inference can benefit from capacity reservation through two main approaches:
- SageMaker AI's Flexible Training Plans (FTP): This feature now supports inference endpoints, allowing you to reserve specific GPU instance types for planned evaluations and production peaks. With FTP, you can choose your preferred instance types, compute requirements, reservation length, and start date for your inference workload. When creating the endpoint, you reference the reservation ARN, and SageMaker AI automatically provisions and runs the endpoint on that guaranteed capacity for the duration of your plan. This helps address the "InsufficientInstanceCapacity" error you're experiencing by ensuring GPU availability when you need it.
When configuring your endpoint with FTP, you can use the CapacityReservationPreference setting with the "capacity-reservations-only" option, which ensures SageMaker launches instances only into your ML capacity reservation.
- On-Demand Capacity Reservations (ODCRs): These allow you to reserve compute capacity for specific instance types in a specific Availability Zone. ODCRs provide uninterrupted access to accelerated instances (GPU, Trainium, or Inferentia) that you reserve, which is particularly valuable for ML workloads with strict capacity requirements.
By using either of these capacity reservation options, you can avoid the "InsufficientInstanceCapacity" error for your GPU-based inference endpoint, especially during times of high demand. The FTP approach is specifically designed for SageMaker inference workloads, while ODCRs are a more general EC2 construct that can also benefit SageMaker deployments.
FTP for inference endpoints is currently available in US East (N. Virginia), US West (Oregon), and US East (Ohio) regions.
Sources
Amazon SageMaker AI now supports Flexible Training Plans capacity for Inference - AWS
ProductionVariantCapacityReservationConfig - Amazon SageMaker
Optimizing cost for building AI models with Amazon EC2 and SageMaker AI | AWS Cloud Financial Management
Real-time inference - Amazon SageMaker AI
Relevant content
- asked 3 years ago
