Automated streaming integration and multiple requests for SageMaker endpoint

Question

A data scientist  is looking to host a Tensorflow model in SageMaker and process low volume streaming event data (~2-3 per second) to collect inferences about each event. Data scientist is looking at having the SageMaker inference model plugged in as a Kinesis Data Analytics Application but Kinesis Data Analytics currently only supports SQL or Flink.

One option to set up an ECS or Lambda service to consume data from Kinesis or SNS and invoke the SageMaker inference endpoint per message, but  if there is a more automated and optimal solution available for these kind of workflows.

It is not possible to pass multiple requests currently to a SageMaker endpoint, yet Tensorflow models tend to perform much better on batches of data rather than multiple single invocations so some windowing would be beneficial. Ideally the client would want to react to an inference within 10-15 seconds of the event being processed so an S3 based batch approach is probably too slow.

Is there anything you can recommend for handling this sort of workload?

Accepted Answer

To build integration between SageMaker endpoints and Kinesis Data Application use this blog - https://aws.amazon.com/blogs/architecture/realtime-in-stream-inference-kinesis-sagemaker-flink/. It help  to setup serverless service to invoke the SageMaker inference endpoint.

To  use batching. The Tensorflow documentation mentions the following:

- [This link][1] mentions that you can include multiple instances in your predict request (or multiple examples in classify/regress requests) to get multiple prediction results in one request to your Endpoint.
 - [This link][2] mentions that you can configure SageMaker TensorFlow Serving Container to batch multiple records together before performing an inference

You would still have to handle the logic internally in ECS/Lambda to control how many records you consume from your stream in one batch, but at least you will be able to infer on the whole batch on the SageMaker endpoint end based on the above.

[1]: https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#making-predictions-against-a-sagemaker-endpoint
  [2]: https://github.com/aws/sagemaker-tensorflow-serving-container#enabling-batching

Automated streaming integration and multiple requests for SageMaker endpoint

관련 콘텐츠