Serverless Inference - Limit number of workers

0

We've deployed a HuggingFace model to Sagemaker as a serverless endpoint. We set memory to be 6GB and max concurrency to be 1. With these settings, we keep getting errors when we call invoke_endpoint. Not all the time, but about 60% of the time...

When we check the logs and metrics, we see that the memory has gone up to almost 100%. We also see that, since the machine has 6 CPUs, if starts 6 workers. We believe this could be the cause of the problem. How can se set the number of workers?

Thanks!

posta 2 anni fa1275 visualizzazioni
2 Risposte
0

From “sagemaker.pytorch.model.PyTorchModeldocumentation:

model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.

You can see this example on how to set “MODEL_SERVER_WORKERS” environment variable to set number of workers.

env={
    "MODEL_SERVER_WORKERS":"2"
    }

local_regressor = Estimator(
    image,
    role,
    instance_count=1,
    instance_type="local")

train_location = 'file://'+local_train
validation_location = 'file://'+local_validation
local_regressor.fit({'train':train_location, 'validation': validation_location}, logs=True)

predictor = local_regressor.deploy(1, 'local', serializer=csv_serializer, env=env)

Hope it helps.

profile pictureAWS
con risposta 2 anni fa
  • Thanks. Adding an "answer" to provide more information below...

0

Eitan, thanks for replying.

I'm not sure if this worked or not, as not the cloudwatch logs are not showing the number of workers anymore! The performance seems to be the same, however. It's failing more often than it's responding. And still reaching almost 100% memory.

Instead of your code, I used the following, as I'm deploying a Hugging Face model:

huggingface_model = HuggingFaceModel(
    name=model_name,
    model_data=os.path.join("s3://" + tar_bucket_name, tarfile_name),
    env={
        'HF_TASK': 'text-classification',
        'MODEL_SERVER_WORKERS': '1',
        'MODEL_SERVER_TIMEOUT': '300'
    },
    role=sagemaker.get_execution_role(),
    entry_point='inference.py',
    transformers_version='4.12.3',
    pytorch_version='1.9.1',
    py_version='py38'
)

Two follow up questions then, if you don't mind:

  1. How can I see if the serverless function actually created only one worker per instance?
  2. Where can I find all the different environment variables accepted by SageMaker?

Many thanks!

Rogerio

con risposta 2 anni fa
  • Hi! I created the model (using CDK) with the environment variable SAGEMAKER_MODEL_SERVER_WORKERS. Maybe that makes the difference?

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande