How to serialize wav file input for Sagemaker CLI create-transform-job?

0

I'm an AWS noob, so please bear with me. My objective is to run an inference on a Whisper model. I'd like the run either to be triggered automatically whenever a new file is loaded to a bucket or using a command line to launch the process.

I've used this tutorial that uses Python SDK syntax as a starting point. I saved the model to S3, and ran the following commands to prep for inference:

aws sagemaker create-model ....
aws sagemaker create-endpoint-config ...
aws sagemaker create-endpoint ...

After saving some WAV files in an input directory and creating an output directory, I run the following:

aws sagemaker create-transform-job --transform-job-name test-predict\
        --transform-resources "InstanceType"="ml.m5.2xlarge","InstanceCount"=1 \
        --model-name whisper-model \
        --transform-input DataSource={S3DataSource={S3DataType=S3Prefix,S3Uri=s3://sagemaker-us-east-1-myawsid/data}},ContentType="audio/wav" \
        --transform-output "S3OutputPath=s3://sagemaker-us-east-1-myawsid/output/"

If we refer back to the tutorial, we would expect this to fail because the WAV files need to be serialized. So I see this error in the logs:

Content type audio/wav is not supported by this framework.
	2024-02-14T12:58:14.234-05:00	2024-02-14T17:58:13.638:[sagemaker logs]: sagemaker-us-east-1-/data/003.wav:
	2024-02-14T12:58:14.234-05:00	2024-02-14T17:58:13.638:[sagemaker logs]: sagemaker-us-east-1-/data/003.wav: Please implement input_fn to to deserialize the request data or an output_fn to
	2024-02-14T12:58:14.234-05:00	2024-02-14T17:58:13.638:[sagemaker logs]: sagemaker-us-east-1-/data/003.wav: serialize the response. For more information, see the SageMaker Python SDK README.

What is the best way to serialize these files and then feed for inference? I'm not tied to using the create-transform-job syntax but would really prefer not use the Python SDK for the MLOps part of the code. If I must use the Python SDK, please suggest a syntax that would allow me to separate out the prediction steps from the MLOps code (perhaps a Sagemaker pipeline is what I need).

Also, I saw an alternative AWS-sourced way of using the Whisper model (instead of HuggingFace),but it seemed more cumbersome to do it with the CLI syntax than the HuggingFace approach I used. Taking a step back from this approach altogether, maybe it's more efficient for me to use AWS Lambda functions? The use case seems appropriate, since I want to run the transcription whenever a new file is loaded to a bucket.

Thanks

asked 2 months ago86 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions