SageMaker inference endpoint with HuggingFaceModel ignores custom script


Hello, I'm trying do deploy a HuggingFaceModel using sagemaker inference endpoint. I've been following some guides, e.g.: this one and this. My model of choice is Llama-2 fine-tuned on my own data. I've packed it and created a model.tar.gz which contains the following structure:

├── config.json
├── generation_config.json
├── tokenizer.json
├── pytorch_model-00001-of-00003.bin
├── ... (other model files)
└── code/
  └── requirements.txt

My script defines the functions model_fn and output_fn, with custom model loading and output parsing logic.

I've uploaded this model.tar.gz to the S3 bucket in model_s3_path.

During the sagemaker endpoint creation, I define my HuggingFaceModel as follows:

from sagemaker.huggingface import get_huggingface_llm_image_uri
from sagemaker.huggingface import HuggingFaceModel

llm_image = get_huggingface_llm_image_uri(

huggingface_model = HuggingFaceModel(
      'HF_MODEL_ID': 'meta-llama/Llama-2-7b-hf',
      'SM_NUM_GPUS': '1',
      'MAX_INPUT_LENGTH': '2048',
      'MAX_TOTAL_TOKENS': '4096',
      'MAX_BATCH_TOTAL_TOKENS': '8192', 
      'HUGGING_FACE_HUB_TOKEN': "<my-hf-token>"

And then I deploy the model:


However, during the inference, the resulting endpoint model doesn't seem to use any of the functionality from, but rather sticks to all default methods. For instance, it still returns response as [{"generated_texts": model_response}] although my post-processing function (output_fn) should've changed the return type.

  1. I've tried setting entry_point="" and source_dir="./code" during the HF model creation - the endpoint was not deploying at all.
  2. Used env variable "SAGEMAKER_PROGRAM": "" - did not change the model's responses, functionality from still was ignored.
  3. Tried various image_uri - did not change the endpoint's behaviour.
asked 9 months ago409 views
1 Answer
Accepted Answer

Hello Vlad,

Thank you for using AWS SageMaker.

I understand that you are trying to built a custom endpoint which will serve to your request with the help of the model that was trained outside SageMaker. The blogs that are used as reference are 3rd party blog so I won't be able to check internally if they have any code fix required, but to better investigate the issue, we would like to know more details about the endpoint configuration and certain backend details along with CloudWatch logs which will help us understand what could be missing and how to fix the issue. As this medium is not secured to share all those details and without that it will be difficult to narrow down the issue, so I request you please create a case with AWS Support so that the available engineers can assist you better to achieving the desired result.

To open a support case with AWS use the link:

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions