Using Sagemaker Triton with Async Endpoint and Binary Data

Question

I've built a Triton container and I'd like to deploy it as an Async Endpoint that's invoked nightly. I have it working ok with AutoScaling and I can invoke it fine using `application/json`.

Its a lot slower than using binary_data though, i.e. I can create the request as follows

```
text = tritonclient.http.InferInput('text', [len(test_data)], "BYTES")
text.set_data_from_numpy(np.array(test_data, dtype=object).reshape(text.shape()), binary_data=True)

labels = tritonclient.http.InferRequestedOutput('labels',  binary_data=True)
scores = tritonclient.http.InferRequestedOutput('scores',  binary_data=True)

# Need to create body then use sagemaker client to send rather than tritonclient directly
request_body, header_length = tritonclient.http.InferenceServerClient.generate_request_body(
        inputs=[text], outputs=[labels, scores]
)

with open("examples/request.bin","wb") as f:
    f.write(request_body)
```

I can copy this to s3 and invoke the endpoint and get the response back no problem,

```
response = sagemaker_runtime.invoke_endpoint_async(
    EndpointName=endpoint_name, 
    InputLocation="s3://data-science.cimenviro.com/models/triton-serve/input/request.bin",
    ContentType=f'application/vnd.sagemaker-triton.binary+json;json-header-size={header_length}')

output_location = response['OutputLocation']
```

The issue is in order to parse the response, I need to access the `json-header-size` from the response ContentType - but because Sagemaker invokes the endpoint it's not available. The response from sagemaker_runtime.invoke_endpoint_async is not the response from invoking the actual model endpoint as it's not been called at this stage. So I cannot reliably extract the response and have to fall back to binary_data=False. i.e. the contents of the response are:

```
b'{"model_name":"ensemble","model_version":"1","parameters":{"sequence_id":0,"sequence_start":false,"sequence_end":false,"sequence_id":0,"sequence_start":false,"sequence_end":false},"outputs":[{"name":"scores","datatype":"FP32","shape":[1,10],"parameters":{"binary_data_size":40}},{"name":"labels","datatype":"INT64","shape":[1,10],"parameters":{"binary_data_size":80}}]}\x05\xa1v?\xc3\x13\xb6;\x15EX;X!!;\x1eE\x05;\xfa\xbc\x83:\xcbah:.\x9ba:\xd0\xdbI:\xdc\x0c0:w\x01\x00\x00\x00\x00\x00\x00\xb2\x01\x00\x00\x00\x00\x00\x00U\x00\x00\x00\x00\x00\x00\x00E\x02\x00\x00\x00\x00\x00\x00\xc7\x03\x00\x00\x00\x00\x00\x00\x8a\x01\x00\x00\x00\x00\x00\x00}\x00\x00\x00\x00\x00\x00\x00z\x01\x00\x00\x00\x00\x00\x004\x00\x00\x00\x00\x00\x00\x005\x03\x00\x00\x00\x00\x00\x00'
```

I need the `json-header-size` to read the json and then the tensors, Is this supported or do I have to use JSON?

Answer

Hi,

Thank you for using AWS Sagemaker.

For this question:

* Firstly, yes Async Inference will not response back the actual result because the Async processing option means it will produce the result at backend in longer duration. And then push the output to the output S3 file you have defined.

* For the Content-Type information, you can actually get that from S3 output object's metadata: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html. S3 output object will have Content-Type in its metadata and We think for this case, you can use it to determine the json-header-size.

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reach to AWS Support by creating a support case[+] so that the engineer can investigate further and help you overcome the issue.

Reference:
[+] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create

Using Sagemaker Triton with Async Endpoint and Binary Data

相關內容