Hi,
I am trying to access train loss metrics using Keras through training job. Following are code snippets.
logs
10#010#010#010#010#010#015194/211 [==========================>...] - ETA: 0s - loss: 0.1073 - accuracy: 0.9682#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#015196/211 [==========================>...] - ETA: 0s - loss: 0.1072 - accuracy: 0.9683#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#015198/211 [===========================>..] - ETA: 0s - loss: 0.1072 - accuracy:
Training job parameters to capture metrix
training_params = {
# specify the training image
"AlgorithmSpecification": {"TrainingImage": byoc_image_uri, "TrainingInputMode": "File",
"MetricDefinitions": [
{
"Name": "train:loss",
"Regex": "- loss:(.*?) -"
}
]},
All I can see on cloud watch is single value for loss but it ran for 4 epochs and all losses for each step should also have been recorded. I am using keras model
Do you see the loss printed in your cloud watch log file for each step. If yes, I would recheck my Regex to see why the metric is not captured.