- Newest
- Most votes
- Most comments
In short, this is not possible unless you manually associate some additional data or can trace back to somewhere the information is available.
I'd suggest to check out SageMaker Lineage Tracking to help with tracking connections like these, but the tools there are generally at the dataset level rather than feature-level. Since SageMaker serves a very broad variety of ML domains (e.g. from tabular to image, voice, video, text, and many more), the concept of "a feature" is tricky to scope without being overly restrictive: Is it a reference to SageMaker Feature Store? What about customers using alternative feature stores or plain CSV data?
If you're working in domains that support it (e.g. especially tabular), I might recommend SageMaker data quality profiling as a nice way to track this. A data quality baseline report will contain schema and also feature distribution information, and you can attach this report to your model package in Model Registry. It won't fully describe the source of your data of course, but will document the properties of it and also enable you to run data drift monitoring on your deployed models.
You should see that supported request/response content types are also available as fields in Model Registry, and you can even attach a sample payload URL as used by Inference Recommender. If you need to store additional information that doesn't have a clear place in Model Registry, you could of course resort to Tags.
So there are multiple options to associate data information with your model packages - but if you have an existing model package without this information, there may not be an automatic way to trace the data "source" in your particular context.
Relevant content
- Accepted Answerasked 7 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago