Estimator Metrics Extraction Problem

0

Hello! I created an estimator with a metrics definition for the validation binary accuracy of the trained model as seen below:

estimator = TensorFlow(entry_point=code_entry,
                   source_dir=code_dir,
                   role=sagemaker_role,
                   instance_type='local',
                   instance_count=1,
                   model_dir='s3://some-bucket/model/',
                   hyperparameters=hyperparameters,
                   output_path='s3://some-bucket/results/',
                   framework_version='2.8',
                   py_version='py39',
                   metric_definitions=[
                        {
                            "Name": "binary_accuracy",
                            "Regex": "val_binary_accuracy=(\d+\.\d+)?",
                        }
                    ],
                   script_mode=True)

According to Sagemaker's Estimator documentation, by setting the metrics definition, Sagemaker will extract metrics from the training logs using a regex:

metric_definitions (list[dict[str, str] or list[dict[str, PipelineVariable]]) – A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs.

The trained model metrics are then available in the properties of the training job in the FinalMetricDataListobject.

In the training logs I can see the log below, which matches the regex in the estimator's metrics definition:

2wn9x0rkaf-algo-1-43gci | INFO:root:val_binary_accuracy=0.5

But when I try to access the metric with the code below:

step_training.properties.FinalMetricDataList['binary_accuracy'].Value

... I get the following error in my conditional step that checks if the metric is higher than a threshold:

 Pipeline step 'CheckAccuracyEvaluation' FAILED. Failure message is: {'Get': "Steps.TrainingStep.FinalMetricDataList['binary_accuracy'].Value"} is undefined.

I really don't understand why the binary_accuracy metric is undefined. The training model step always conclude successfully.

How to get the metrics from a trained model using an estimator with script mode?

Thank you in advance!

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions