Deploy Inference Endpoint for HG Model Pygmalion

0

Can someone help me load my model to create an endpoint?

Provided explanation of steps followed, error logs and code used to create everything...thank you in advance.

I'm trying very hard to infer the ''Pygmalion/pygmalion-6b'' model from HuggingFace. But it gives an error when trying to load the worker, it seems unable to. I'm pretty sure I selected an instance big enough, so it shouldn't be because of an OOM exception (I think...)

ERROR LOGS:

2023-06-27T00:20:24.851+02:00 2023-06-26T22:20:24,738 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died. 776125764525:/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-06-26-22-10-51-934 Link 2023-06-26T22:20:24,738 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died. 2023-06-27T00:21:19.999+02:00 2023-06-26T22:21:16,306 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] 776125764525:/aws/sagemaker/Endpoints/huggingface-pytorch-inference-2023-06-26-22-10-51-934 Link 2023-06-26T22:21:16,306 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]

STEPS FOLLOWED:

I creating the inference endpoint through a notebook and the sagemaker python sdk. I've taken the model from the following huggingFace url: https://huggingface.co/PygmalionAI/pygmalion-6b, downloaded it, removed the ''runs'' folder (which to my knowledge contain training logs), and compressed the model, uploaded to s3 and followed guides on how to use python sdk. If i try to load the model using the HF_ID directly (PygmalionAI/pygmalion-6b) it won't work. (not providing python code because of input msg length restriction.

Here's my python code: pip install sagemaker --upgrade pip install transformers --upgrade

import sagemaker import boto3 sess = sagemaker.Session() sagemaker_session_bucket=None if sagemaker_session_bucket is None and sess is not None: sagemaker_session_bucket = sess.default_bucket()

try: role = sagemaker.get_execution_role() except ValueError: iam = boto3.client('iam') role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}") print(f"sagemaker session region: {sess.boto_region_name}")

from sagemaker.huggingface import get_huggingface_llm_image_uri

llm_image = get_huggingface_llm_image_uri( "huggingface", version="0.8.2" )

print(f"llm image uri: {llm_image}")

import json from sagemaker.huggingface import HuggingFaceModel

instance_type = "ml.g4dn.2xlarge" number_of_gpu = 1 health_check_timeout = 750

config = { 'SM_NUM_GPUS': json.dumps(number_of_gpu), 'MAX_INPUT_LENGTH': json.dumps(1024),
'MAX_TOTAL_TOKENS': json.dumps(2048), }

llm_model = HuggingFaceModel( model_data='s3://pygmalion-6b-s3/model.tar.gz', role=role, transformers_version="4.26", pytorch_version="1.13", py_version='py39', env=config, )

llm = llm_model.deploy( initial_instance_count=1, instance_type=instance_type, container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model )

data = { "inputs": "hello, how are you?" }

python script based on: https://huggingface.co/blog/sagemaker-huggingface-llm but also tried with this version: https://huggingface.co/docs/sagemaker/inference#create-a-model-artifact-for-deployment and was also unable to load the worker.. TT

2 Answers
0

Hi, have a look at https://github.com/oobabooga/text-generation-webui/issues/440 where folks describe a problem similar to yours with pygmalion-6b-dev.

Some of them propose solutions that may apply to your case: https://github.com/oobabooga/text-generation-webui/issues/440#issuecomment-1475506032

Best,

Didier

profile pictureAWS
EXPERT
answered 10 months ago
0

Unfortunately, i'm not using textwebgenui, when I load it with that it seems to work fine... Might try to dig deepeer to find out if it really is an OOM but I dont think so, may be something to do with the env the notebook is created with?

answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions