- Newest
- Most votes
- Most comments
Hi - this is the OP:
Thanks for your response. Yes I should have stated in the original question: I am logging these metrics to the console every iteration and can see them in view logs
in the console. The issue is that I'm not able to:
- view the parsed metrics for the period (I can only see the mean/max/min/...)
- get visualisations of these metrics
Were you able to find a way to visualize experiments metrics ? I just started using SM experiments and can't find a way to visualize metrics !
Hello, Thank you for contacting us.
I understand you are not able to visualize any graphs for your metrics, even though you see them in "Trial Components > Metrics (tab)".
SageMaker parses the Cloudwatch logs for your training job and emits metrics from the parsed logs as defined in the metrics_definition. The Cloudwatch logs for your training job depends on your train.py script. For example, If you wish to have metrics per step (or per 100 steps), your script needs to print the metrics per step (or per 100 steps) so that it is there in the Cloudwatch logs. Please see the documentation linked below for more information.
And I am providing you with examples from our official AWS Github Repository in [1] and [2] which provides an Entry Script which emits custom metrics for a Hyper-parameter Tuning Job. Please compare them with your script. [1] https://github.com/aws/amazon-sagemaker-examples/tree/master/hyperparameter_tuning/tensorflow2_mnist [2] https://github.com/aws/amazon-sagemaker-examples/tree/master/hyperparameter_tuning/tensorflow_mnist
If you still have difficulties, I recommend to cut a support case and provide more detail about your account information and script/config. Due to security reason, we cannot discuss account specific issue in the public posts.
Thank you.
Relevant content
- asked 9 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a month ago
When you describe your training job, what do you see in
FinalMetricDataList
? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.describe_training_job