I attempted to create an endpoint using the huggingface features of the sagemaker package for Python
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'MY_ACC/MY_MODEL_NAME',
'HF_TASK':'text-generation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.6.1',
pytorch_version='1.7.1',
py_version='py36',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
predictor.predict({
'inputs': "Can you please let us know more details about your "
})
MY_ACC/MY_MODEL_NAME
is a private model, so the endpoint creation kept outputting the following errors:
This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var `HF_MODEL_ID`
Traceback (most recent call last):
File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module>
serving.main()
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 34, in main
_start_mms()
File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call
return attempt.get(self._wrap_exception)
File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/opt/conda/lib/python3.6/site-packages/six.py", line 719, in reraise
raise value
File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 30, in _start_mms
mms_model_server.start_model_server(handler_service=HANDLER_SERVICE)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/mms_model_server.py", line 75, in start_model_server
use_auth_token=HF_API_TOKEN,
File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 154, in _load_model_from_hub
model_info = _api.model_info(repo_id=model_id, revision=revision, token=use_auth_token)
File "/opt/conda/lib/python3.6/site-packages/huggingface_hub/hf_api.py", line 155, in model_info
r.raise_for_status()
File "/opt/conda/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/MY_ACC/MY_MODEL_NAME
It seems like there is a never-ending loop of checking for this resource that it cannot find. I have killed the Python process that started the endpoint creation, but it has carried on regardless.
How do I fix this? I just want to delete the endpoint, but that option is greyed-out as it is still in the creation phase.
Thanks