1 回答
- 最新
- 投票最多
- 评论最多
0
Hi,
Unfortunately, the multi model is not supported to run an async inferences yet, but alternatively you could host several models within a single container and run them using BYOS strategy.
Here is an example to utilize async inference for multiple models packaged in a single tar.
- Package multiple models in a single tar.gz file:
with tarfile.open("multi-model.tar.gz", "w:gz") as mm:
mm.add("./multi-model/xgboost-model1")
mm.add("./multi-model/xgboost-model2")
- Upload the model to S3 bucket:
sm_session = sagemaker.session.Session()
sm_client = boto_session.client("sagemaker")
sm_runtime = boto_session.client("sagemaker-runtime")
def upload_model(input_location):
prefix = "async-multi-model-example/models
return sm_session.upload_data(
input_location,
bucket=sm_session.default_bucket(),
key_prefix="async-multi-model-example")
model_url = upload_model("multi-model.tar.gz")
- Create a sageMaker model from pre-trained model tar.gz:
from sagemaker import image_uris
container = image_uris.retrieve(region=region, framework="xgboost", version="1.2-1")
create_model_response = sm_client.create_model(
ModelName="async-multi-model",
ExecutionRoleArn=sm_role,
PrimaryContainer={
"Image": container,
"ModelDataUrl": model_url,
},
)
- Create an endpoint with config:
create_endpoint_config_response = sm_client.create_endpoint_config(
EndpointConfigName="async-multi-model-config",
ProductionVariants=[
{
"VariantName": "variant1",
"ModelName": model_name,
"InstanceType": "ml.m5.xlarge",
"InitialInstanceCount": 1,
}
],
AsyncInferenceConfig={
"OutputConfig": {
"S3OutputPath": f"s3://{s3_bucket}/{bucket_prefix}/output",
},
"ClientConfig": {"MaxConcurrentInvocationsPerInstance": 4},
},
)
- Create the endpoint
create_endpoint_response = sm_client.create_endpoint(
EndpointName="async-multi-model-endpoint", EndpointConfigName="async-multi-model-config"
)
- Invoke the endpoint async:
# Specify the location of the input. Here, a single SVM sample
input_location = "s3://{s3_bucket}/test_point_0.libsvm"
endpoint_name="async-multi-model-endpoint"
response = sm_runtime.invoke_endpoint_async(
EndpointName=endpoint_name,
InputLocation=input_location)
This is a customized alternative solution since the feature is not enabled yet, you may going to a bit off the beaten path with that.
Hope it helps
已回答 2 年前
相关内容
- AWS 官方已更新 1 年前
- AWS 官方已更新 1 年前
First, Thanks for your kindly comment. From this comment, I want to inference img_A.jpg with xgboost-model1 and img_B.jpg with xgboost-model2. Can I choose model?
You may potentially use CustomAttributes to tell the container which model to run, but as I mentioned above there are more to try and discover, it may create some tech debt that needs to be cleaned up when MME is supported.
Reference is below: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointAsync.html#API_runtime_InvokeEndpointAsync_RequestSyntax