Background
I want to build an ML Inference pipeline that will use SageMaker Asynchronous Inference.
To decrease costs I want to down all SageMaker Async Inference-related EC2s when no jobs are waiting (for example for time out of business hours or during working hours where there are no requests from my users).
The questions
- On average, how long does it take for AWS SageMaker Async Inference to get an up-and-running EC2 with a GPU ready to execute my ML tasks/inference?
- What is the current availability of GPU machines on AWS? Is there any shortage?