[problem at MMS predict] At MMS(sagemaker), error code(500), type(InternalServerException)

0

I make pytorch model with sagemaker, MMS. This is my mms code.

%%time
instance_type = 'c5.large'
# accelerator_type = 'eia2.medium'
predictor = mme.deploy(
    initial_instance_count=1,
    instance_type=f"ml.{instance_type}"
)

mme.add_model(model_data_source=model_path, model_data_path="model.tar.gz")
list(mme.list_models())
#> [ 'model.tar.gz']

I try to predict with this code.

start_time = time.time()
predicted_value = predictor.predict(requests, target_model="LV1")
duration = time.time() - start_time
print("${:,.2f}, took {:,d} ms\n".format(predicted_value[0], int(duration * 1000)))

And, return error message.

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
  "code": 500,
  "type": "InternalServerException",
  "message": "Failed to start workers"
}

MMS with pytorch is 'little' difficult. X)

help me, please.

preguntada hace 2 años417 visualizaciones
2 Respuestas
1
Respuesta aceptada

Hi , I think your target model on the prediction needs to have the name of the model you have deployed - for example , when you are adding the model with mme.add_model(model_data_source=model_path, model_data_path="model.tar.gz") the model_data_path contains the name of the model . From the sagemaker-examples: (https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.ipynb) **model_data_path is the relative path to the S3 prefix we specified above (i.e. model_data_prefix) where our endpoint will source models for inference requests.Since this is a relative path, we can simply pass the name of what we wish to call the model artifact at inference time (i.e. Chicago_IL.tar.gz). In your case "model.tar.gz". However, when predicting you call the model ,target_model="LV1"?

AWS
EXPERTO
respondido hace 2 años
  • Accoding to your comment, I modify code and excution like that.

0

Accoding to your comment, I modify code and excution. I try 2 solution.

#1 predictor.predict

predicted_value = predictor.predict(data=requests, target_model="modal.tar.gz")

return

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Failed to download model data(bucket: sagemaker-ap-northeast-2-344487737937, key: LouisVuiotton-cpu-2022-08-16-02-02-04-408-c6i-large/model/modal.tar.gz). Please ensure that there is an object located at the URL and that the role passed to CreateModel has permissions to download the model.

#2 With boto3, invoke_endpoint()

import boto3

client = boto3.client('sagemaker-runtime')
endpoint_name = predictor.endpoint_name
response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=requests,
    ContentType='application/x-image',
#     Accept='string',
#     CustomAttributes='string',
    TargetModel='model.tar.gz',
#     TargetVariant='string',
#     TargetContainerHostname='string',
#     InferenceId='string'
)

return

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{
  "code": 500,
  "type": "InternalServerException",
  "message": "Failed to start workers"
}
". See https://ap-northeast-2.console.aws.amazon.com/cloudwatch/home?region=ap-northeast-2#logEventViewer:group=/aws/sagemaker/Endpoints/LV-multi-2022-08-16-02-11-15 in account 344487737937 for more information.

I assume sol 2, boto3.invoke_endpoint's result [ "message": "Failed to start workers" ] come from sol 1, [that the role passed to CreateModel has permissions to download the model.].

I already use excution role [''arn:aws:iam::344487737937:role/service-role/AmazonSageMaker-ExecutionRole-20220713T151818"]. How to I get additional role (that the role passed to CreateModel has permissions to download the model.)?

respondido hace 2 años
  • It's my IAM at my company.

     IAMReadOnlyAccess
     CloudWatchLogsReadOnlyAccess
     AmazonSageMakerFullAccess
     AmazonS3FullAccess
     ServiceQuotasFullAccess
     AWSBillingReadOnlyAccess
    

    Should I get more IAM?

  • I solve this! I try to download image at endpoint. but endpoint can not connect outside network except Lambda.

    1. I make request with s3url
    2. Download image from s3 to lambda
    3. Transmit image from lambda to endpoint

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas