Visualizing custom Model Monitor metrics in Sagemaker Studio

2

We are developing a custom model monitoring container to be used to interact with Sagemaker's model monitoring API, as we require additional custom metrics. One of our requirements is for these metrics to populate and visualize appropriately in Studio visualization tabs, e.g. Data Quality, Model Quality, and Explainability. None of the built-in containers are an option for us, and we are trying to interop with the Monitoring APIs as seamlessly as possible.

The docs specify the container contracts for output, specifically statistics.json, constraints.json, and constraint_violations.json. However, the docs are not clear on what JSON files to emit for specifically custom Model Quality metrics. There does not appear to be a provided schema.

Things I have tried:

  • Emitting CloudWatch metrics to a different namespace, sagemaker/Endpoint/model-quality-metrics
  • Attempting to retrofit the statistics.json file to also include Model Quality metrics.

Is there a separate JSON file I should write to /opt/ml/processing/resultdata for our custom model quality metrics to populate the Studio visualization tab?

Specifically, where do the base containers look for this JSON report: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html

2 Answers
1

Hi Michael,

If I understand your question correctly, you are trying to build a BYOC Model Monitor container in order to bring in custom metrics to monitor your model. Firstly, when we take that route, the metrics that are monitored by default, lets say for Model Quality, metrics such as mae, r1 for instance does not come out of the box, you have to incorporate logic to calculate the respective metrics listed here. In short, you cannot "add-on" a new custom metric to the existing metric OOTB until the SageMaker team exposes the container which does this logic publicly.

Secondly, there is no "strict" formatting(json files) to writing your custom metrics per say, it is at the discretion of the customer to implement the the json file as required as long as they can read that in the BYOC container to calculate violations and so on. However, we encourage customers to follow the KLL fashion as show in the example here.

Linking a few samples of BYOC implementations of model monitor for your reference below -

  1. NLP data drift BYOC model monitor - https://github.com/aws-samples/detecting-data-drift-in-nlp-using-amazon-sagemaker-custom-model-monitor

  2. CV BYOC model monitor - https://aws.amazon.com/blogs/machine-learning/detecting-and-analyzing-incorrect-model-predictions-with-amazon-sagemaker-model-monitor-and-debugger/

AWS
answered 2 years ago
  • Thank you for your answer and examples provided, it is appreciated.

    You are right on the context of the question. We do know that we have to implement the logic ourselves, and that the preferred sketch to be used is compact KLL quantiles - this is totally fine.

    My question is more asking where and under what name the metrics listed here: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html are stored as output in the monitoring container, so that they will be uploaded and visualized within Studio. This is so that we can mimic the behavior with our own metrics, and our own model quality logic, but still have the results visualized within Studio.

    Is there a certain file to output (such as statistics.json in other cases) that would be recognized in the outputs/results directory of the container, that would be recognized and visualized, much like it is with the default implementation?

  • I have ran a default ModelQualityMonitor job. I see that this job also produces the 3-set of (constraints.json, constraint_violations.json, statistics.json). How are these differentiated between monitoring job types?

    My original question had assumed that these would be different, with Studio looking for separate files for each monitoring job type (model quality vs data quality, etc.) to visualize.

    But... if I understand your response correctly and the docs, now... if I start an appropriate custom monitoring job for each type of monitoring (data, and model quality) and provide those three files that correspond to the container contract schemas and KLL sketch... then all should work out?

0

Apologies for the delayed response Michael, to answer your question s

1/ on where and what name the metrics need to be stored - The constraints and violations files need to be written to /opt/ml/processing https://github.com/aws-samples/detecting-data-drift-in-nlp-using-amazon-sagemaker-custom-model-monitor/blob/main/docker/evaluation.py#L156 and the filename is constraint_violations.json

2/ I have ran a default ModelQualityMonitor job. I see that this job also produces the 3-set of (constraints.json, constraint_violations.json, statistics.json). How are these differentiated between monitoring job types?

Answer: Even though the files generated by each job is the same no matter the type of Model monitoring you choose(Data/Quality), if you closely observe, the model data monitoring for Statistics file will have -

"mean" : 0.13082980736646624, "sum" : 54.94851909391582, "std_dev" : 0.2511377559440087, "min" : 0.006144209299236536, "max" : 0.989563524723053, "distribution" : {

whereas, statistics for Model Quality will have ( for Binary classification problem)

"binary_classification_metrics" : { "confusion_matrix" : { "0" : { "0" : 173, "1" : 0 }, "1" : { "0" : 12, "1" : 16 }

To sum it up, yes if you provide these 3 files i.e statistics.json, constraints.json and constraint_violations.json you should be all set.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions