I want to deploy my model as a Serverless inference

0

Hey I trained a sickst learn model using python sdk and I want to deploy the model as a Serverless inference now. I am new to AWS and can't seem to make sense of the documentation. the model is fit it an estimator as follow:

from sagemaker.sklearn.estimator import SKLearn
enable_local_mode_training = False


inputs = {"train": trainpath, "test": testpath}

estimator_parameters = {
    "entry_point": "script_rf.py",
    "framework_version": "1.0-1",
    "py_version": "py3",
    "instance_type": 'ml.c5.xlarge',
    "instance_count": 1,
    "role": role,
    "base_job_name": "randomforestclassifier-model"
}

estimator = SKLearn(**estimator_parameters)
estimator.fit(inputs)

this works fine but now when I try to deploy it it doesn't work. I tried this code: https://aws.amazon.com/blogs/machine-learning/deploying-ml-models-using-sagemaker-serverless-inference-preview/ but i keep getting errors because somethings are not defined like the image_uri which I am not using.

I used this

m_boto3 = boto3.client('sagemaker')

estimator.latest_training_job.wait(logs='None')
artifact = m_boto3.describe_training_job(
    TrainingJobName=estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts']

print('Model artifact persisted at ' + artifact)

but then the endpoint is not Serverless. please help

2 Answers
0

To deploy a serverless endpoint on Amazon SageMaker you will need 3 steps:

(1) Model creation

Where you create the model with client.create_model

(2) Endpoint configuration creation

Where you create a configuration file with client.create_endpoint_config, and make sure you have a serverless configuration with two parameters: MemorySizeInMB, and MaxConcurrency within the config, which should look like

"ServerlessConfig": {
        "MemorySizeInMB": 4096,
        "MaxConcurrency": 1,
},

(3) Endpoint creation and invocation

Once you have created a model, with endpoint configuration with serverless endpoint detail, then you can run client.create_endpoint to create a serverless endpoint.

Reference: Deploying ML models using SageMaker Serverless Inference

profile picture
Mia C
answered 3 months ago
0

Actually creating a serverless endpoint has become much easier because you can now use the SageMaker Python SDK to so, i.e. you don't have to use boto3 anymore, you don't have to create models or endpoint configurations anymore, and you also don't have to specify the image_uri anymore.

Instead, once you your estimator you can just use these lines of code:

from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig()

serverless_config = ServerlessInferenceConfig(
  memory_size_in_mb=4096,
  max_concurrency=10,
)

serverless_predictor = estimator.deploy(serverless_inference_config=serverless_config)

See also the documentation: https://sagemaker.readthedocs.io/en/stable/overview.html#sagemaker-serverless-inference

Heiko
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions