Skip to content

Sagemaker batch transform output type

0

I am using sagemaker_inference in my custom python scripts for generating predictions using batch transform. The batch data is in csv format stored in s3. The input function and inference work perfectly but I am struggling with the output format. My inference script works with pandas dataframes and the output function. I am following this github repo to model my code - https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/multi_model_bring_your_own/container/model_handler.py. I use encoder.encode to encode the output as return encoder.encode(df.values, content_types.CSV), content_types.CSV but it fails with this error:

2024-08-12T06:29:39,394 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model: model, Invalid return type: <class 'tuple'>.

I have tried converting it to csv_string and then encode it as utf-8 but that failed too. I am not sure what am I missing. Any inputs on how to debug it further will be appreciated.

asked a year ago270 views
1 Answer
0

Tentatively, going by the SageMaker Inference Toolkit doc, I think you should just be returning the encoder.encode(...) result and not the tuple of that plus content_types.CSV?

In terms of debugging, my best suggestion would probably be to try and use SageMaker Local Mode to test the transform job against a small subset of data locally - to avoid having to wait for instance start-up each time. It should be as straightforward as setting instance_type="local" from your notebook as shown in this example, as long as you're running from an environment where docker is available and you can pull your ECR image. If you're running from SageMaker Studio, you can enable Local Mode using the instructions here and the lifecycle config script here.

AWS
EXPERT
answered a year ago
  • thanks Alex. Let me try the suggestion. I have an M2 Mac so local mode hasn't been as easy. I tried the cross compiled docker image but that started to fail with Illegal instruction error so right now I have to rely on Sagemaker. Also, even for the case where local mode was working I did run in to issue where running in Sagemaker my run was failing during an import and it took a while to root cause to a keras import from keras.preprocessing.sequence import pad_sequences having to be replaced with from keras.utils import pad_sequences.

  • returning encoder.encode(...) result fails with - 2024-08-14T07:25:53,114 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model: model, Invalid return type: <class 'str'>

  • If you're struggling with M2 then SageMaker Local Mode from a SageMaker notebook instance might be easier... Or if you prefer a VSCode-style interface then Code Editor on SMStudio is pretty easy to set up for a quick-create test "Domain" that you can delete after. I'm not quite clear how your code is set up because the sample linked in the original question doesn't actually use encoder.encode anywhere... Maybe you could update the question with some additional links or more skeleton code?

  • (It's true that SM Local Mode emulates some aspects of the SageMaker container run-time so it's possible for occasional inconsistencies around exactly how the container is invoked or data fed through it - but the Keras thing you mentioned is really surprising because the container image in which your code runs should really be the same between the two: I guess your local docker cache could have been out of sync with ECR maybe?)

  • I will be able to try Sagemaker notebook instance next week. I will update this thread once I have the change to run it. I will add the additional code around that time. But just want to confirm is it best to model my inference code using SageMaker Inference Toolkit doc that you referred in your post earlier.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.