ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{ "code": 400, .....

0

Hi, Problem Context: Trying to deploy Finetuned LLM , FALCON-7b . using sagemaker below is the code from SageMaker Noteook,

from sagemaker.huggingface.model import HuggingFaceModel
import sagemaker
import json
# Define your SageMaker role
try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='Sagemaker-ExecutionRole')['Role']['Arn']
    
    
    print(f"sagemaker role arn: {role}")

    
    
trust_remote_code = True
# Hub model configuration <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID':'tdicommons/falcon_28_06_23', # model_id from hf.co/models
  #'HF_TASK':'text-generation' ,
    'SM_NUM_GPUS': json.dumps(1),
    'HF_API_TOKEN': "hf_yclNrVDnzcDjAgYZffDJFukzKSr*********",
    'model_type': 'RefinedWebModel'                                  # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.10.2",                             # Transformers version used
   pytorch_version="1.9.0",                                  # PyTorch version used
   py_version='py38',                                      # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=300,
)

# # example request: you always need to define "inputs"
# prompt = f"""
# : hey how are you?
# :
# """.strip()


# # # request
# # predictor.predict(prompt)

# # hyperparameters for llm #https://huggingface.co/blog/sagemaker-huggingface-llm#4-run-inference-and-chat-with-our-model
# payload = {
#   "inputs": prompt,
  
# }

# # send request to endpoint
# response = predictor.predict(# send request
predictor.predict({
	"inputs": "hey how are you?",
})```
============================================================================
when i watch the cloudwatch i find this there:

 W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - KeyError: 'RefinedWebModel'
2023-08-26 13:45:06,652 [INFO ] W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - KeyError: 'RefinedWebModel'	AllTraffic/i-0982c97a270117858

W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle -     raise PredictionException(str(e),

 W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'RefinedWebModel' : 400
2023-08-26 13:45:06,653 [INFO ] W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'RefinedWebModel' : 400
ACCESS_LOG - /169.254.178.2:57684 "GET /ping HTTP/1.1" 200 1
2023-08-26 13:49:07,830 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:57684 "GET /ping HTTP/1.1" 200 

My Questions:
1- Am i passing something incorrectly in variable hub ={ 'HF_MODEL_ID' : ...[Here is my fine tuned model,
'HF_TASK' : 'text-generation'......,....}
2- Why is that i can do inferencing on Sagemaker without issue, but when i use same model and make endpoint i am geting eror?
3- Do you think my way of passing input to model is wrong:
predictor.predict({
	"inputs": "hey how are you",
})
profile picture
asked 8 months ago133 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions