Hi,
Problem Context: Trying to deploy Finetuned LLM , FALCON-7b . using sagemaker
below is the code from SageMaker Noteook,
from sagemaker.huggingface.model import HuggingFaceModel
import sagemaker
import json
# Define your SageMaker role
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='Sagemaker-ExecutionRole')['Role']['Arn']
print(f"sagemaker role arn: {role}")
trust_remote_code = True
# Hub model configuration <https://huggingface.co/models>
hub = {
'HF_MODEL_ID':'tdicommons/falcon_28_06_23', # model_id from hf.co/models
#'HF_TASK':'text-generation' ,
'SM_NUM_GPUS': json.dumps(1),
'HF_API_TOKEN': "hf_yclNrVDnzcDjAgYZffDJFukzKSr*********",
'model_type': 'RefinedWebModel' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # IAM role with permissions to create an endpoint
transformers_version="4.10.2", # Transformers version used
pytorch_version="1.9.0", # PyTorch version used
py_version='py38', # Python version used
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)
# # example request: you always need to define "inputs"
# prompt = f"""
# : hey how are you?
# :
# """.strip()
# # # request
# # predictor.predict(prompt)
# # hyperparameters for llm #https://huggingface.co/blog/sagemaker-huggingface-llm#4-run-inference-and-chat-with-our-model
# payload = {
# "inputs": prompt,
# }
# # send request to endpoint
# response = predictor.predict(# send request
predictor.predict({
"inputs": "hey how are you?",
})```
============================================================================
when i watch the cloudwatch i find this there:
W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - KeyError: 'RefinedWebModel'
2023-08-26 13:45:06,652 [INFO ] W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - KeyError: 'RefinedWebModel' AllTraffic/i-0982c97a270117858
W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise PredictionException(str(e),
W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'RefinedWebModel' : 400
2023-08-26 13:45:06,653 [INFO ] W-tdicommons__falcon_28_06_-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: 'RefinedWebModel' : 400
ACCESS_LOG - /169.254.178.2:57684 "GET /ping HTTP/1.1" 200 1
2023-08-26 13:49:07,830 [INFO ] pool-1-thread-3 ACCESS_LOG - /169.254.178.2:57684 "GET /ping HTTP/1.1" 200
My Questions:
1- Am i passing something incorrectly in variable hub ={ 'HF_MODEL_ID' : ...[Here is my fine tuned model,
'HF_TASK' : 'text-generation'......,....}
2- Why is that i can do inferencing on Sagemaker without issue, but when i use same model and make endpoint i am geting eror?
3- Do you think my way of passing input to model is wrong:
predictor.predict({
"inputs": "hey how are you",
})