Inference endpoint not responding when invoked by lambda

0

Hi fellow AWS users,

I am working on an inference pipeline on AWS. Simply put, I have trained a PyTorch model and I deployed it (and created an inference endpoint) on Sagemaker from a notebook.

On the other hand, I have a lambda that will be triggered whenever there is a new audio that gets uploaded to my S3 bucket and pass the name of that audio to the endpoint. The endpoint downloads the audio, performs some pre-processing (super-quick) and returns predictions. The lambda then sends these predictions by email.

Audios get uploaded on the S3 bucket on a non regular basis, like around 10 audios a day, at irregular intervals.

This morning, I tried manually uploading a test audio to the bucket to check if the pipeline was working. It turns out that my endpoint is correctly invoked by my lambda but looking at the endpoint logs nothing happens (and I don't get any email). I tried a couple of times, without any more success. The lambda just ends up timing out after 300ms (what I set). However, invoking the endpoint from my sagemaker notebook worked perfectly fine on the first try and seemed to unblock the endpoint. After that, the endpoint was responsive to the lambda invokation. Was that because the endpoint was not "cold" anymore and it was a coincidence, I couldn't tell.

My questions are:

  • Are there any differences in endpoint invokations between the two scenarios (from the lambda or from the Sagemaker notebook)?
  • How can we see how much time after an invokation the endpoint will become "cold" again? Please correct me If I am wrong using the term cold here. I know it applies to lambdas as well. To what I understood, the endpoint is basically calling my inference script on a ECR container.
  • According to my use case (number of inferences a day, pre-proccesing lightness, ...), what would be the best option for my endpoint? (async, batch, ...)
  • My lambda seems to try invokation twice in total (invoke 1 - timeout 1 - invoke 2 - timeout 2). Can that be set differently?
  • Shall I increase the timeout of my lambda and let it try more times until the ECR is "warm"? or is there such a setting that can be modified on the endpoint side?

Thank you so much in advance for your support.

Cheers

Antoine

1 Answer
0

I had a similar issue and in my case my lambda function was not transforming the input data into the right format for my inference endpoint to digest. Assuming your lambda function takes an input of bucket/key for the location of the audio file try and mock-up a test directly in lambda to see what errors it is throwing.

PS: Yes there is a difference in how SageMaker Studio calls the inference endpoint and lambda. With lambda the invoke_endpoint API is used and in SageMaker Studio (using MXNET/Gluon framework) the predict/predictor method is called.

PS2: Initially to rule out IAM you may want to give Lambda function AmazonS3FullAccess and AmazonSageMakerFullAccess policies.

AWS
answered 2 years ago
  • Hello Aleksei,

    Thanks for your reply!

    In my case, my lambda function transforms the input data into the right format. It just didn't manage to invoke the endpoint at first but after invoking it from Sagemaker it unblocked everything.

    My pipeline works fine now, but I don't know where the issue came from. And It would be good to know. Maybe it happens at first invocation? What about when the endpoint is not invoked after a long time and the lambda tries to hit it again? Is it then due to a long cold start time, longer via Lambda than Sagemaker?

    My invocation lambda only has the following policy attached (see below extract of Serverless iml file):

        - Effect: "Allow"
          Action:
            - "sagemaker:InvokeEndpoint"
          Resource: "arn:aws:sagemaker:${file(./config.${self:provider.stage}.yml):ENDPOINT_ARN}"
    

    Is that not enough?

    Cheers

    Antoine

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions