Serverless Inference - Limit number of workers

0

We've deployed a HuggingFace model to Sagemaker as a serverless endpoint. We set memory to be 6GB and max concurrency to be 1. With these settings, we keep getting errors when we call invoke_endpoint. Not all the time, but about 60% of the time...

When we check the logs and metrics, we see that the memory has gone up to almost 100%. We also see that, since the machine has 6 CPUs, if starts 6 workers. We believe this could be the cause of the problem. How can se set the number of workers?

Thanks!

질문됨 2년 전1274회 조회
2개 답변
0

From “sagemaker.pytorch.model.PyTorchModeldocumentation:

model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.

You can see this example on how to set “MODEL_SERVER_WORKERS” environment variable to set number of workers.

env={
    "MODEL_SERVER_WORKERS":"2"
    }

local_regressor = Estimator(
    image,
    role,
    instance_count=1,
    instance_type="local")

train_location = 'file://'+local_train
validation_location = 'file://'+local_validation
local_regressor.fit({'train':train_location, 'validation': validation_location}, logs=True)

predictor = local_regressor.deploy(1, 'local', serializer=csv_serializer, env=env)

Hope it helps.

profile pictureAWS
답변함 2년 전
  • Thanks. Adding an "answer" to provide more information below...

0

Eitan, thanks for replying.

I'm not sure if this worked or not, as not the cloudwatch logs are not showing the number of workers anymore! The performance seems to be the same, however. It's failing more often than it's responding. And still reaching almost 100% memory.

Instead of your code, I used the following, as I'm deploying a Hugging Face model:

huggingface_model = HuggingFaceModel(
    name=model_name,
    model_data=os.path.join("s3://" + tar_bucket_name, tarfile_name),
    env={
        'HF_TASK': 'text-classification',
        'MODEL_SERVER_WORKERS': '1',
        'MODEL_SERVER_TIMEOUT': '300'
    },
    role=sagemaker.get_execution_role(),
    entry_point='inference.py',
    transformers_version='4.12.3',
    pytorch_version='1.9.1',
    py_version='py38'
)

Two follow up questions then, if you don't mind:

  1. How can I see if the serverless function actually created only one worker per instance?
  2. Where can I find all the different environment variables accepted by SageMaker?

Many thanks!

Rogerio

답변함 2년 전
  • Hi! I created the model (using CDK) with the environment variable SAGEMAKER_MODEL_SERVER_WORKERS. Maybe that makes the difference?

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠