Serverless Inference - Limit number of workers

0

We've deployed a HuggingFace model to Sagemaker as a serverless endpoint. We set memory to be 6GB and max concurrency to be 1. With these settings, we keep getting errors when we call invoke_endpoint. Not all the time, but about 60% of the time...

When we check the logs and metrics, we see that the memory has gone up to almost 100%. We also see that, since the machine has 6 CPUs, if starts 6 workers. We believe this could be the cause of the problem. How can se set the number of workers?

Thanks!

asked 2 years ago1259 views
2 Answers
0

From “sagemaker.pytorch.model.PyTorchModeldocumentation:

model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.

You can see this example on how to set “MODEL_SERVER_WORKERS” environment variable to set number of workers.

env={
    "MODEL_SERVER_WORKERS":"2"
    }

local_regressor = Estimator(
    image,
    role,
    instance_count=1,
    instance_type="local")

train_location = 'file://'+local_train
validation_location = 'file://'+local_validation
local_regressor.fit({'train':train_location, 'validation': validation_location}, logs=True)

predictor = local_regressor.deploy(1, 'local', serializer=csv_serializer, env=env)

Hope it helps.

profile pictureAWS
answered 2 years ago
  • Thanks. Adding an "answer" to provide more information below...

0

Eitan, thanks for replying.

I'm not sure if this worked or not, as not the cloudwatch logs are not showing the number of workers anymore! The performance seems to be the same, however. It's failing more often than it's responding. And still reaching almost 100% memory.

Instead of your code, I used the following, as I'm deploying a Hugging Face model:

huggingface_model = HuggingFaceModel(
    name=model_name,
    model_data=os.path.join("s3://" + tar_bucket_name, tarfile_name),
    env={
        'HF_TASK': 'text-classification',
        'MODEL_SERVER_WORKERS': '1',
        'MODEL_SERVER_TIMEOUT': '300'
    },
    role=sagemaker.get_execution_role(),
    entry_point='inference.py',
    transformers_version='4.12.3',
    pytorch_version='1.9.1',
    py_version='py38'
)

Two follow up questions then, if you don't mind:

  1. How can I see if the serverless function actually created only one worker per instance?
  2. Where can I find all the different environment variables accepted by SageMaker?

Many thanks!

Rogerio

answered 2 years ago
  • Hi! I created the model (using CDK) with the environment variable SAGEMAKER_MODEL_SERVER_WORKERS. Maybe that makes the difference?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions