Sagemaker training job request fails from Lambda with EFS


I have a Lambda function with an s3 trigger to perform 2 parts:

  1. Convert the file uploaded to the trigger s3 bucket from csv to parquet and write a copy to another bucket
  2. Trigger a sagemaker training job with parameters that have contents from the csv file. This lambda also as an EFS attached to it.

Part 1 executes without any issue, but the part 2 goes into a timeout with out any explicit errors. I do not understand where to look for. The lambda has all the necessary permissions to trigger the sagemaker jobs.

1 Answer

AWS Lambda has a default function timeout of 15 minutes (link). As troubleshooting actions that you can do are the following:

  • check if there is any Training Jobs created in the SageMaker console
  • check the Lambda’s CloudWatch’s logs and metrics. You can do this by going into the Lambda console inside your function click on the Monitor tab and then you can click on the View logs in CloudWatch button. Once inside the CloudWatch console you can click on the corresponding log stream and then visualize all the messages from your Lambda. Inside your Lambda function you can add additional print commands for logging the intermediate steps. In addition, you can retrieve the Lambda logs using AWS CLI.

Here is the documentation for accessing Amazon CloudWatch logs for AWS Lambda.

Hope this information is helpful for you.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions