Using Sagemaker Triton with Async Endpoint and Binary Data

0

I've built a Triton container and I'd like to deploy it as an Async Endpoint that's invoked nightly. I have it working ok with AutoScaling and I can invoke it fine using application/json.

Its a lot slower than using binary_data though, i.e. I can create the request as follows

text = tritonclient.http.InferInput('text', [len(test_data)], "BYTES")
text.set_data_from_numpy(np.array(test_data, dtype=object).reshape(text.shape()), binary_data=True)

labels = tritonclient.http.InferRequestedOutput('labels',  binary_data=True)
scores = tritonclient.http.InferRequestedOutput('scores',  binary_data=True)

# Need to create body then use sagemaker client to send rather than tritonclient directly
request_body, header_length = tritonclient.http.InferenceServerClient.generate_request_body(
        inputs=[text], outputs=[labels, scores]
)

with open("examples/request.bin","wb") as f:
    f.write(request_body)

I can copy this to s3 and invoke the endpoint and get the response back no problem,

response = sagemaker_runtime.invoke_endpoint_async(
    EndpointName=endpoint_name, 
    InputLocation="s3://data-science.cimenviro.com/models/triton-serve/input/request.bin",
    ContentType=f'application/vnd.sagemaker-triton.binary+json;json-header-size={header_length}')

output_location = response['OutputLocation']

The issue is in order to parse the response, I need to access the json-header-size from the response ContentType - but because Sagemaker invokes the endpoint it's not available. The response from sagemaker_runtime.invoke_endpoint_async is not the response from invoking the actual model endpoint as it's not been called at this stage. So I cannot reliably extract the response and have to fall back to binary_data=False. i.e. the contents of the response are:

b'{"model_name":"ensemble","model_version":"1","parameters":{"sequence_id":0,"sequence_start":false,"sequence_end":false,"sequence_id":0,"sequence_start":false,"sequence_end":false},"outputs":[{"name":"scores","datatype":"FP32","shape":[1,10],"parameters":{"binary_data_size":40}},{"name":"labels","datatype":"INT64","shape":[1,10],"parameters":{"binary_data_size":80}}]}\x05\xa1v?\xc3\x13\xb6;\x15EX;X!!;\x1eE\x05;\xfa\xbc\x83:\xcbah:.\x9ba:\xd0\xdbI:\xdc\x0c0:w\x01\x00\x00\x00\x00\x00\x00\xb2\x01\x00\x00\x00\x00\x00\x00U\x00\x00\x00\x00\x00\x00\x00E\x02\x00\x00\x00\x00\x00\x00\xc7\x03\x00\x00\x00\x00\x00\x00\x8a\x01\x00\x00\x00\x00\x00\x00}\x00\x00\x00\x00\x00\x00\x00z\x01\x00\x00\x00\x00\x00\x004\x00\x00\x00\x00\x00\x00\x005\x03\x00\x00\x00\x00\x00\x00'

I need the json-header-size to read the json and then the tensors, Is this supported or do I have to use JSON?

Dave
已提問 9 個月前檢視次數 380 次
1 個回答
0

Hi,

Thank you for using AWS Sagemaker.

For this question:

  • Firstly, yes Async Inference will not response back the actual result because the Async processing option means it will produce the result at backend in longer duration. And then push the output to the output S3 file you have defined.

  • For the Content-Type information, you can actually get that from S3 output object's metadata: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html. S3 output object will have Content-Type in its metadata and We think for this case, you can use it to determine the json-header-size.

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reach to AWS Support by creating a support case[+] so that the engineer can investigate further and help you overcome the issue.

Reference: [+] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create

AWS
已回答 9 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南