[problem at MMS predict] At MMS(sagemaker), error code(500), type(InternalServerException)

0

I make pytorch model with sagemaker, MMS. This is my mms code.

%%time
instance_type = 'c5.large'
# accelerator_type = 'eia2.medium'
predictor = mme.deploy(
    initial_instance_count=1,
    instance_type=f"ml.{instance_type}"
)

mme.add_model(model_data_source=model_path, model_data_path="model.tar.gz")
list(mme.list_models())
#> [ 'model.tar.gz']

I try to predict with this code.

start_time = time.time()
predicted_value = predictor.predict(requests, target_model="LV1")
duration = time.time() - start_time
print("${:,.2f}, took {:,d} ms\n".format(predicted_value[0], int(duration * 1000)))

And, return error message.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
  "code": 500,
  "type": "InternalServerException",
  "message": "Failed to start workers"
}

MMS with pytorch is 'little' difficult. X)

help me, please.

2 réponses
1
Réponse acceptée

Hi , I think your target model on the prediction needs to have the name of the model you have deployed - for example , when you are adding the model with mme.add_model(model_data_source=model_path, model_data_path="model.tar.gz") the model_data_path contains the name of the model . From the sagemaker-examples: (https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.ipynb) **model_data_path is the relative path to the S3 prefix we specified above (i.e. model_data_prefix) where our endpoint will source models for inference requests.Since this is a relative path, we can simply pass the name of what we wish to call the model artifact at inference time (i.e. Chicago_IL.tar.gz). In your case "model.tar.gz". However, when predicting you call the model ,target_model="LV1"?

AWS
EXPERT
répondu il y a 2 ans
  • Accoding to your comment, I modify code and excution like that.

0

Accoding to your comment, I modify code and excution. I try 2 solution.

#1 predictor.predict

predicted_value = predictor.predict(data=requests, target_model="modal.tar.gz")

return

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Failed to download model data(bucket: sagemaker-ap-northeast-2-344487737937, key: LouisVuiotton-cpu-2022-08-16-02-02-04-408-c6i-large/model/modal.tar.gz). Please ensure that there is an object located at the URL and that the role passed to CreateModel has permissions to download the model.

#2 With boto3, invoke_endpoint()

import boto3

client = boto3.client('sagemaker-runtime')
endpoint_name = predictor.endpoint_name
response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=requests,
    ContentType='application/x-image',
#     Accept='string',
#     CustomAttributes='string',
    TargetModel='model.tar.gz',
#     TargetVariant='string',
#     TargetContainerHostname='string',
#     InferenceId='string'
)

return

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
  "code": 500,
  "type": "InternalServerException",
  "message": "Failed to start workers"
}
". See https://ap-northeast-2.console.aws.amazon.com/cloudwatch/home?region=ap-northeast-2#logEventViewer:group=/aws/sagemaker/Endpoints/LV-multi-2022-08-16-02-11-15 in account 344487737937 for more information.

I assume sol 2, boto3.invoke_endpoint's result [ "message": "Failed to start workers" ] come from sol 1, [that the role passed to CreateModel has permissions to download the model.].

I already use excution role [''arn:aws:iam::344487737937:role/service-role/AmazonSageMaker-ExecutionRole-20220713T151818"]. How to I get additional role (that the role passed to CreateModel has permissions to download the model.)?

répondu il y a 2 ans
  • It's my IAM at my company.

     IAMReadOnlyAccess
     CloudWatchLogsReadOnlyAccess
     AmazonSageMakerFullAccess
     AmazonS3FullAccess
     ServiceQuotasFullAccess
     AWSBillingReadOnlyAccess
    

    Should I get more IAM?

  • I solve this! I try to download image at endpoint. but endpoint can not connect outside network except Lambda.

    1. I make request with s3url
    2. Download image from s3 to lambda
    3. Transmit image from lambda to endpoint

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions