2 Risposte
- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
1
We had the same problem. The failure at the model upload stage turned out to be a red herring. A subprocess during the training stage was being killed due to being out of memory, not during the model upload stage. Note that our main training process exited cleanly.
con risposta 4 mesi fa
0
As per this document https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-output.html the final model should be written to /opt/ml/model by the algorithm in order to successfully upload it to S3 as a single object in compressed tar format
con risposta un anno fa
Contenuto pertinente
- AWS UFFICIALEAggiornata 7 mesi fa
- AWS UFFICIALEAggiornata 2 anni fa
- Come posso risolvere gli errori durante l'esecuzione dei processi di formazione di Amazon SageMaker?AWS UFFICIALEAggiornata 2 anni fa
- AWS UFFICIALEAggiornata 2 anni fa
This turned out to be the actual issue for me as well. Increasing the instance memory and playing around with different SageMaker instances solved the problem for me. Thanks for commenting!!