[problem at MMS predict] At MMS(sagemaker), error code(500), type(InternalServerException)

0

I make pytorch model with sagemaker, MMS. This is my mms code.

%%time
instance_type = 'c5.large'
# accelerator_type = 'eia2.medium'
predictor = mme.deploy(
    initial_instance_count=1,
    instance_type=f"ml.{instance_type}"
)

mme.add_model(model_data_source=model_path, model_data_path="model.tar.gz")
list(mme.list_models())
#> [ 'model.tar.gz']

I try to predict with this code.

start_time = time.time()
predicted_value = predictor.predict(requests, target_model="LV1")
duration = time.time() - start_time
print("${:,.2f}, took {:,d} ms\n".format(predicted_value[0], int(duration * 1000)))

And, return error message.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
  "code": 500,
  "type": "InternalServerException",
  "message": "Failed to start workers"
}

MMS with pytorch is 'little' difficult. X)

help me, please.

2 Answers
1
Accepted Answer

Hi , I think your target model on the prediction needs to have the name of the model you have deployed - for example , when you are adding the model with mme.add_model(model_data_source=model_path, model_data_path="model.tar.gz") the model_data_path contains the name of the model . From the sagemaker-examples: (https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.ipynb) **model_data_path is the relative path to the S3 prefix we specified above (i.e. model_data_prefix) where our endpoint will source models for inference requests.Since this is a relative path, we can simply pass the name of what we wish to call the model artifact at inference time (i.e. Chicago_IL.tar.gz). In your case "model.tar.gz". However, when predicting you call the model ,target_model="LV1"?

AWS
EXPERT
answered 2 years ago
0

Accoding to your comment, I modify code and excution. I try 2 solution.

#1 predictor.predict

predicted_value = predictor.predict(data=requests, target_model="modal.tar.gz")

return

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Failed to download model data(bucket: sagemaker-ap-northeast-2-344487737937, key: LouisVuiotton-cpu-2022-08-16-02-02-04-408-c6i-large/model/modal.tar.gz). Please ensure that there is an object located at the URL and that the role passed to CreateModel has permissions to download the model.

#2 With boto3, invoke_endpoint()

import boto3

client = boto3.client('sagemaker-runtime')
endpoint_name = predictor.endpoint_name
response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=requests,
    ContentType='application/x-image',
#     Accept='string',
#     CustomAttributes='string',
    TargetModel='model.tar.gz',
#     TargetVariant='string',
#     TargetContainerHostname='string',
#     InferenceId='string'
)

return

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
  "code": 500,
  "type": "InternalServerException",
  "message": "Failed to start workers"
}
". See https://ap-northeast-2.console.aws.amazon.com/cloudwatch/home?region=ap-northeast-2#logEventViewer:group=/aws/sagemaker/Endpoints/LV-multi-2022-08-16-02-11-15 in account 344487737937 for more information.

I assume sol 2, boto3.invoke_endpoint's result [ "message": "Failed to start workers" ] come from sol 1, [that the role passed to CreateModel has permissions to download the model.].

I already use excution role [''arn:aws:iam::344487737937:role/service-role/AmazonSageMaker-ExecutionRole-20220713T151818"]. How to I get additional role (that the role passed to CreateModel has permissions to download the model.)?

answered 2 years ago
  • It's my IAM at my company.

     IAMReadOnlyAccess
     CloudWatchLogsReadOnlyAccess
     AmazonSageMakerFullAccess
     AmazonS3FullAccess
     ServiceQuotasFullAccess
     AWSBillingReadOnlyAccess
    

    Should I get more IAM?

  • I solve this! I try to download image at endpoint. but endpoint can not connect outside network except Lambda.

    1. I make request with s3url
    2. Download image from s3 to lambda
    3. Transmit image from lambda to endpoint

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions