How do you find out input features required for a sagemaker model to do batch inference?

0

Say for example I have a trained sagemaker model artefact or model in a model registry. Now I need to prepare the input dataset to be used in the model for batch inference. How do I know what input features the model is expecting so that I can prepare the data accordingly ? Is there a way to find out from the model artefact?

已提问 2 年前535 查看次数
1 回答
-1

In short, this is not possible unless you manually associate some additional data or can trace back to somewhere the information is available.

I'd suggest to check out SageMaker Lineage Tracking to help with tracking connections like these, but the tools there are generally at the dataset level rather than feature-level. Since SageMaker serves a very broad variety of ML domains (e.g. from tabular to image, voice, video, text, and many more), the concept of "a feature" is tricky to scope without being overly restrictive: Is it a reference to SageMaker Feature Store? What about customers using alternative feature stores or plain CSV data?

If you're working in domains that support it (e.g. especially tabular), I might recommend SageMaker data quality profiling as a nice way to track this. A data quality baseline report will contain schema and also feature distribution information, and you can attach this report to your model package in Model Registry. It won't fully describe the source of your data of course, but will document the properties of it and also enable you to run data drift monitoring on your deployed models.

You should see that supported request/response content types are also available as fields in Model Registry, and you can even attach a sample payload URL as used by Inference Recommender. If you need to store additional information that doesn't have a clear place in Model Registry, you could of course resort to Tags.

So there are multiple options to associate data information with your model packages - but if you have an existing model package without this information, there may not be an automatic way to trace the data "source" in your particular context.

AWS
专家
Alex_T
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则