Deploy Inference Endpoint for HG Model Pygmalion

0

Can someone help me load my model to create an endpoint?

Provided explanation of steps followed, error logs and code used to create everything...thank you in advance.

I'm trying very hard to infer the ''Pygmalion/pygmalion-6b'' model from HuggingFace. But it gives an error when trying to load the worker, it seems unable to. I'm pretty sure I selected an instance big enough, so it shouldn't be because of an OOM exception (I think...)

ERROR LOGS:

2023-06-27T00:20:24.851+02:00 2023-06-26T22:20:24,738 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died. 776125764525:/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-06-26-22-10-51-934 Link 2023-06-26T22:20:24,738 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died. 2023-06-27T00:21:19.999+02:00 2023-06-26T22:21:16,306 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] 776125764525:/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-06-26-22-10-51-934 Link 2023-06-26T22:21:16,306 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]

STEPS FOLLOWED:

I creating the inference endpoint through a notebook and the sagemaker python sdk. I've taken the model from the following huggingFace url: https://huggingface.co/PygmalionAI/pygmalion-6b, downloaded it, removed the ''runs'' folder (which to my knowledge contain training logs), and compressed the model, uploaded to s3 and followed guides on how to use python sdk. If i try to load the model using the HF_ID directly (PygmalionAI/pygmalion-6b) it won't work. (not providing python code because of input msg length restriction.

Here's my python code: pip install sagemaker --upgrade pip install transformers --upgrade

import sagemaker import boto3 sess = sagemaker.Session() sagemaker_session_bucket=None if sagemaker_session_bucket is None and sess is not None: sagemaker_session_bucket = sess.default_bucket()

try: role = sagemaker.get_execution_role() except ValueError: iam = boto3.client('iam') role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}") print(f"sagemaker session region: {sess.boto_region_name}")

from sagemaker.huggingface import get_huggingface_llm_image_uri

llm_image = get_huggingface_llm_image_uri( "huggingface", version="0.8.2" )

print(f"llm image uri: {llm_image}")

import json from sagemaker.huggingface import HuggingFaceModel

instance_type = "ml.g4dn.2xlarge" number_of_gpu = 1 health_check_timeout = 750

config = { 'SM_NUM_GPUS': json.dumps(number_of_gpu), 'MAX_INPUT_LENGTH': json.dumps(1024),
'MAX_TOTAL_TOKENS': json.dumps(2048), }

llm_model = HuggingFaceModel( model_data='s3://pygmalion-6b-s3/model.tar.gz', role=role, transformers_version="4.26", pytorch_version="1.13", py_version='py39', env=config, )

llm = llm_model.deploy( initial_instance_count=1, instance_type=instance_type, container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model )

data = { "inputs": "hello, how are you?" }

python script based on: https://huggingface.co/blog/sagemaker-huggingface-llm but also tried with this version: https://huggingface.co/docs/sagemaker/inference#create-a-model-artifact-for-deployment and was also unable to load the worker.. TT

2回答
0

Hi, have a look at https://github.com/oobabooga/text-generation-webui/issues/440 where folks describe a problem similar to yours with pygmalion-6b-dev.

Some of them propose solutions that may apply to your case: https://github.com/oobabooga/text-generation-webui/issues/440#issuecomment-1475506032

Best,

Didier

profile pictureAWS
エキスパート
回答済み 1年前
0

Unfortunately, i'm not using textwebgenui, when I load it with that it seems to work fine... Might try to dig deepeer to find out if it really is an OOM but I dont think so, may be something to do with the env the notebook is created with?

回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ