使用Sagemaker创建Huggingface模型错误,Sagemaker endpoint 一直处于“创建”状态。

0

【以下的问题经过翻译处理】 我试图使用sagemaker Python包的huggingface功能创建一个端点。

从sagemaker.huggingface导入HuggingFaceModel导入sagemaker

role= sagemaker.get_execution_role() Hub Model configuration. https://huggingface.co/models hub = { 'HF_MODEL_ID':'MY_ACC/MY_MODEL_NAME', 'HF_TASK':'text-generation' }

创建Hugging Face模型类 huggingface_model = HuggingFaceModel( transformers_version='4.6.1', pytorch_version='1.7.1', py_version='py36', env=hub, role=role, )

将模型部署到SageMaker Inference predictor = huggingface_model.deploy( initial_instance_count=1, # number of instances instance_type='ml.m5.xlarge' # ec2 instance type )

predictor.predict({ 'inputs': "Can you please let us know more details about your " })

MY_ACC / MY_MODEL_NAME是私有模型,因此端点创建一直输出以下错误:

This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var HF_MODEL_ID

Traceback (most recent call last): File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module> serving.main() File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 34, in main _start_mms() File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f return Retrying(dargs, dkw).call(f, args, kw) File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call return attempt.get(self._wrap_exception) File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/opt/conda/lib/python3.6/site-packages/six.py", line 719, in reraise raise value File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call attempt = Attempt(fn(args, *kwargs), attempt_number, False) File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 30, in _start_mms mms_model_server.start_model_server(handler_service=HANDLER_SERVICE) File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/mms_model_server.py", line 75, in start_model_server use_auth_token=HF_API_TOKEN, File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 154, in _load_model_from_hub model_info = _api.model_info(repo_id=model_id, revision=revision, token=use_auth_token) File "/opt/conda/lib/python3.6/site-packages/huggingface_hub/hf_api.py", line 155, in model_info r.raise_for_status() File "/opt/conda/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/MY_ACC/MY_MODEL_NAME

看来有一个永无止境的循环来检查这个状态,我已经终止了开始创建端点的 Python 进程,但无论如何它都在继续。

我该如何解决这个问题?我只想删除端点,但是该选项显示为灰色,因为它仍处于创建阶段。

谢谢

profile picture
专家
已提问 8 个月前69 查看次数
1 回答
0

【以下的回答经过翻译处理】 这个是没有关系的,当创建一直运行时,这个过程最终会超时。希望对其他人有用。另外您还可以尝试使用AWS CLI 删除endpoint 。

profile picture
专家
已回答 8 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则