How do you find out input features required for a sagemaker model to do batch inference?


Say for example I have a trained sagemaker model artefact or model in a model registry. Now I need to prepare the input dataset to be used in the model for batch inference. How do I know what input features the model is expecting so that I can prepare the data accordingly ? Is there a way to find out from the model artefact?

asked a year ago424 views
1 Answer

In short, this is not possible unless you manually associate some additional data or can trace back to somewhere the information is available.

I'd suggest to check out SageMaker Lineage Tracking to help with tracking connections like these, but the tools there are generally at the dataset level rather than feature-level. Since SageMaker serves a very broad variety of ML domains (e.g. from tabular to image, voice, video, text, and many more), the concept of "a feature" is tricky to scope without being overly restrictive: Is it a reference to SageMaker Feature Store? What about customers using alternative feature stores or plain CSV data?

If you're working in domains that support it (e.g. especially tabular), I might recommend SageMaker data quality profiling as a nice way to track this. A data quality baseline report will contain schema and also feature distribution information, and you can attach this report to your model package in Model Registry. It won't fully describe the source of your data of course, but will document the properties of it and also enable you to run data drift monitoring on your deployed models.

You should see that supported request/response content types are also available as fields in Model Registry, and you can even attach a sample payload URL as used by Inference Recommender. If you need to store additional information that doesn't have a clear place in Model Registry, you could of course resort to Tags.

So there are multiple options to associate data information with your model packages - but if you have an existing model package without this information, there may not be an automatic way to trace the data "source" in your particular context.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions