- Le plus récent
- Le plus de votes
- La plupart des commentaires
This is consistent with the CreateTrainingJob API doc, and to my knowledge it's a hard (non-adjustable) limit. However if you have a strong requirement, it may be worth raising a support case to double-check whether an increase is possible?
You could consider logging additional metrics to CloudWatch directly from your training script via the CloudWatch APIs / boto3 if needed? I expect there'd be some limitations in where the metrics are visible (e.g. showing on the training job details page in SageMaker console? showing in the Experiments & Trials view in SageMaker Studio?) - but if you were able to get them logged under the same /aws/sagemaker/TrainingJobs/{TrainingJobName}
namespace as the auto-collected metrics, they might reflect. Your script code should be able to determine the current training job name from the TRAINING_JOB_NAME environment variable if wanting to try this.
Be aware that (while fast), metric data API calls can take some time: In an ideal world you might do them asynchronously to avoid slowing down your training job.
Contenus pertinents
- demandé il y a 5 mois
- demandé il y a 7 mois
- demandé il y a un an
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 8 mois
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a un an
This sounds like a great limitation for experiment tracking.. Will have to find another workaround..
I have actually separately tried to manually log metrics within a Sagemaker training job, but encountered
code is
Of course, there's a couple of other factors e.g. (1) I was running distributed training, (2) run huggingface Trainer API