Estimator Metrics Extraction Problem

0

Hello! I created an estimator with a metrics definition for the validation binary accuracy of the trained model as seen below:

estimator = TensorFlow(entry_point=code_entry,
                   source_dir=code_dir,
                   role=sagemaker_role,
                   instance_type='local',
                   instance_count=1,
                   model_dir='s3://some-bucket/model/',
                   hyperparameters=hyperparameters,
                   output_path='s3://some-bucket/results/',
                   framework_version='2.8',
                   py_version='py39',
                   metric_definitions=[
                        {
                            "Name": "binary_accuracy",
                            "Regex": "val_binary_accuracy=(\d+\.\d+)?",
                        }
                    ],
                   script_mode=True)

According to Sagemaker's Estimator documentation, by setting the metrics definition, Sagemaker will extract metrics from the training logs using a regex:

metric_definitions (list[dict[str, str] or list[dict[str, PipelineVariable]]) – A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs.

The trained model metrics are then available in the properties of the training job in the FinalMetricDataListobject.

In the training logs I can see the log below, which matches the regex in the estimator's metrics definition:

2wn9x0rkaf-algo-1-43gci | INFO:root:val_binary_accuracy=0.5

But when I try to access the metric with the code below:

step_training.properties.FinalMetricDataList['binary_accuracy'].Value

... I get the following error in my conditional step that checks if the metric is higher than a threshold:

 Pipeline step 'CheckAccuracyEvaluation' FAILED. Failure message is: {'Get': "Steps.TrainingStep.FinalMetricDataList['binary_accuracy'].Value"} is undefined.

I really don't understand why the binary_accuracy metric is undefined. The training model step always conclude successfully.

How to get the metrics from a trained model using an estimator with script mode?

Thank you in advance!

没有答案

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则

相关内容