SageMaker Endpoint Debugging

0

I'm deploying a custom inference.py on a neural network trained on Sagemaker and stored in S3. One issue is writing the inference.py and debugging as I am creating the endpoint. Each time, I have to wait for the endpoint to start and make a prediction. So I end up waiting roughly 10 minutes between each error that I am able to receive and fix. Because I am using the JSON serializer and deserializer, I am unable to use the local instance mode.

Is there an alternative way to debug endpoints such that I can have one "endpoint" up and running and any time I make changes to my inference.py, the endpoint references the most recent script?

Samuel
已提問 4 個月前檢視次數 149 次
1 個回答
0

Ideally JSONSerializer and JSONDeserializer should not prevent you from debugging/testing sagemaker inference endpoints locally. For an initial check, make sure that your boto3 and sagemaker libraries are up to date. Because Local Mode on sagemaker is fairly experimental, many bugs are addressed and fixed with each new version (and possibly breaking changes so keep an eye out). There are a number of examples in this aws-samples public github repository that you can refer to that run a number of combinations of frameworks and serializers. https://github.com/aws-samples/amazon-sagemaker-local-mode

This specific inference endpoint setup script utilizes CSVSerialiser/CSVDeserializer for nlp input: https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/pytorch_nlp_script_mode_local_model_inference/pytorch_nlp_script_mode_local_model_inference.py

If you are interested in examples of the usage of JSONSerializer/JSONDeserializer then this folder should be more up your alley: https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/huggingface_hebert_sentiment_analysis_local_serving/huggingface_hebert_sentiment_analysis_local_serving.py

If this is an issue with a specific combination that you notice do submit an issue to the public github for the sagemaker sdk (https://github.com/aws/sagemaker-python-sdk).

If the repository examples are not helpful, there is another way to speed up development. Using pre-built containers you can extend them and preinstall your required packages instead of using a requirements.txt. You can push then image to EXR and specify it in your model (example here is Pytorch):

model_instance = PyTorchModel(
    image_uri = <YourImageECRURI>,
    model_data=model_tar_path,
    role=role,
    source_dir="code",
    entry_point="inference.py",
    framework_version="1.8",
    py_version="py3"
)

This will stop the container from reinstalling your packages on every deploy. https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html

Once you are sure that your model is working as you designed, you can use the Amazon Sagemaker Inference Recommender to figure out the optimal deployment parameters for you model. This will optimize the instance type your model will run on for best performance at the lowest cost: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html

AWS
已回答 3 個月前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南