- Newest
- Most votes
- Most comments
This issue appears to be related to a sudden incompatibility between the specified instance type and the SageMaker distribution algorithm. Even though your code and configuration haven't changed, it's possible that there have been updates on the AWS side that have affected the compatibility of certain instance types with specific algorithms.
To resolve this issue, you can try the following steps:
-
Check if the instance type 'ml.c6i.12xlarge' is still supported for training jobs in your region. It's possible that AWS has made changes to the available instance types for SageMaker.
-
Verify if there have been any recent updates to the SageMaker distribution algorithm that might have changed its instance type compatibility.
-
Try using a different compatible instance type. For example, you could try using an ml.c5 instance instead of ml.c6i.
-
Ensure that your SageMaker SDK is up to date. Sometimes, updating to the latest version can resolve compatibility issues.
-
Check your IAM permissions to make sure you still have the necessary permissions to create training jobs with the specified instance type.
-
Review the SageMaker quotas for your account to ensure you haven't reached any limits that might prevent you from using certain instance types.
If the issue persists after trying these steps, it would be advisable to contact AWS support for further assistance. They can provide more specific information about any recent changes that might have affected your pipeline and help you find a suitable solution.
Sources
CreateTrainingJob - Amazon SageMaker
CreateOptimizationJob - Amazon SageMaker
Relevant content
- asked 6 years ago
- AWS OFFICIALUpdated a year ago

Turns out it was
Updated the sagemaker python dependency to "^2.232.3" (latest) from "^2.220.0" and it started working again.
Not quite sure why the working code stopped working even though the imaged hadn't changed...