How to debug invocation timeout in sagemaker?

1

I am testing inference in sagemaker , by using one of the container listed here -> https://github.com/aws/deep-learning-containers/blob/master/available_images.md. the model is zipped up as below and with in inference.py file , i am overwriting functions like model_fn method and predict_fn. I tested this with batch transform and it worked but for few small input files but for other larger files, i keep getting "Model server did not respond to /invocations request within 3600 seconds" . I'm trying to find out what is the cause of it? 3600 is the max we can set for "invocation timeout in seconds" parameter and the default input size for batch is 6mb , the input files i'm using are way smaller than that but i still get that error.

Directory structure

model.tar.gz/
|- model.pth
|- code/
  |- inference.py
  |- requirements.txt  

file : inference.py

import torch
import os

def model_fn(model_dir):
    model = Your_Model()
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f))
    return model

def predict_fn():
    //

based on docs here, https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-batch-code.html#your-algorithms-batch-code-how-containers-should-respond-to-inferences, do we need to install flask and have an /invocations endpoint , that responds 200 ok , when we are using custom container?

質問済み 2年前2551ビュー
1回答
0

One of the best ways to debug a custom inference script would be to start off with using the SageMaker "local mode". Once you are sure that your script is working fine, move over to hosting on the SageMaker endpoint. Here are some of the examples to get started.

Example for a TF serving model that I have a custom Inference script, I would use local mode as shown below for my testing-

from sagemaker.tensorflow.model import TensorFlowModel
from sagemaker.local import LocalSession

tensorflow_serving_model = TensorFlowModel(
    model_data=model_data,
    role=sagemaker_role,
    framework_version="2.6",
  # sagemaker_session=sagemaker_session,
  sagemaker_session=LocalSession()
)
AWS
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ