2 Answers
- Newest
- Most votes
- Most comments
1
We had the same problem. The failure at the model upload stage turned out to be a red herring. A subprocess during the training stage was being killed due to being out of memory, not during the model upload stage. Note that our main training process exited cleanly.
answered 3 months ago
0
As per this document https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-output.html the final model should be written to /opt/ml/model by the algorithm in order to successfully upload it to S3 as a single object in compressed tar format
answered 10 months ago
Relevant content
- asked 2 years ago
- asked 10 months ago
- asked a year ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
This turned out to be the actual issue for me as well. Increasing the instance memory and playing around with different SageMaker instances solved the problem for me. Thanks for commenting!!