SageMaker Endpoint Debugging

0

I'm deploying a custom inference.py on a neural network trained on Sagemaker and stored in S3. One issue is writing the inference.py and debugging as I am creating the endpoint. Each time, I have to wait for the endpoint to start and make a prediction. So I end up waiting roughly 10 minutes between each error that I am able to receive and fix. Because I am using the JSON serializer and deserializer, I am unable to use the local instance mode.

Is there an alternative way to debug endpoints such that I can have one "endpoint" up and running and any time I make changes to my inference.py, the endpoint references the most recent script?

Samuel
asked 4 months ago128 views
1 Answer
0

Ideally JSONSerializer and JSONDeserializer should not prevent you from debugging/testing sagemaker inference endpoints locally. For an initial check, make sure that your boto3 and sagemaker libraries are up to date. Because Local Mode on sagemaker is fairly experimental, many bugs are addressed and fixed with each new version (and possibly breaking changes so keep an eye out). There are a number of examples in this aws-samples public github repository that you can refer to that run a number of combinations of frameworks and serializers. https://github.com/aws-samples/amazon-sagemaker-local-mode

This specific inference endpoint setup script utilizes CSVSerialiser/CSVDeserializer for nlp input: https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/pytorch_nlp_script_mode_local_model_inference/pytorch_nlp_script_mode_local_model_inference.py

If you are interested in examples of the usage of JSONSerializer/JSONDeserializer then this folder should be more up your alley: https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/huggingface_hebert_sentiment_analysis_local_serving/huggingface_hebert_sentiment_analysis_local_serving.py

If this is an issue with a specific combination that you notice do submit an issue to the public github for the sagemaker sdk (https://github.com/aws/sagemaker-python-sdk).

If the repository examples are not helpful, there is another way to speed up development. Using pre-built containers you can extend them and preinstall your required packages instead of using a requirements.txt. You can push then image to EXR and specify it in your model (example here is Pytorch):

model_instance = PyTorchModel(
    image_uri = <YourImageECRURI>,
    model_data=model_tar_path,
    role=role,
    source_dir="code",
    entry_point="inference.py",
    framework_version="1.8",
    py_version="py3"
)

This will stop the container from reinstalling your packages on every deploy. https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html

Once you are sure that your model is working as you designed, you can use the Amazon Sagemaker Inference Recommender to figure out the optimal deployment parameters for you model. This will optimize the instance type your model will run on for best performance at the lowest cost: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html

AWS
answered 3 months ago
profile picture
EXPERT
reviewed 23 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions