Using Sagemaker Triton with Async Endpoint and Binary Data

0

I've built a Triton container and I'd like to deploy it as an Async Endpoint that's invoked nightly. I have it working ok with AutoScaling and I can invoke it fine using application/json.

Its a lot slower than using binary_data though, i.e. I can create the request as follows

text = tritonclient.http.InferInput('text', [len(test_data)], "BYTES")
text.set_data_from_numpy(np.array(test_data, dtype=object).reshape(text.shape()), binary_data=True)

labels = tritonclient.http.InferRequestedOutput('labels',  binary_data=True)
scores = tritonclient.http.InferRequestedOutput('scores',  binary_data=True)

# Need to create body then use sagemaker client to send rather than tritonclient directly
request_body, header_length = tritonclient.http.InferenceServerClient.generate_request_body(
        inputs=[text], outputs=[labels, scores]
)

with open("examples/request.bin","wb") as f:
    f.write(request_body)

I can copy this to s3 and invoke the endpoint and get the response back no problem,

response = sagemaker_runtime.invoke_endpoint_async(
    EndpointName=endpoint_name, 
    InputLocation="s3://data-science.cimenviro.com/models/triton-serve/input/request.bin",
    ContentType=f'application/vnd.sagemaker-triton.binary+json;json-header-size={header_length}')

output_location = response['OutputLocation']

The issue is in order to parse the response, I need to access the json-header-size from the response ContentType - but because Sagemaker invokes the endpoint it's not available. The response from sagemaker_runtime.invoke_endpoint_async is not the response from invoking the actual model endpoint as it's not been called at this stage. So I cannot reliably extract the response and have to fall back to binary_data=False. i.e. the contents of the response are:

b'{"model_name":"ensemble","model_version":"1","parameters":{"sequence_id":0,"sequence_start":false,"sequence_end":false,"sequence_id":0,"sequence_start":false,"sequence_end":false},"outputs":[{"name":"scores","datatype":"FP32","shape":[1,10],"parameters":{"binary_data_size":40}},{"name":"labels","datatype":"INT64","shape":[1,10],"parameters":{"binary_data_size":80}}]}\x05\xa1v?\xc3\x13\xb6;\x15EX;X!!;\x1eE\x05;\xfa\xbc\x83:\xcbah:.\x9ba:\xd0\xdbI:\xdc\x0c0:w\x01\x00\x00\x00\x00\x00\x00\xb2\x01\x00\x00\x00\x00\x00\x00U\x00\x00\x00\x00\x00\x00\x00E\x02\x00\x00\x00\x00\x00\x00\xc7\x03\x00\x00\x00\x00\x00\x00\x8a\x01\x00\x00\x00\x00\x00\x00}\x00\x00\x00\x00\x00\x00\x00z\x01\x00\x00\x00\x00\x00\x004\x00\x00\x00\x00\x00\x00\x005\x03\x00\x00\x00\x00\x00\x00'

I need the json-header-size to read the json and then the tensors, Is this supported or do I have to use JSON?

Dave
질문됨 9달 전382회 조회
1개 답변
0

Hi,

Thank you for using AWS Sagemaker.

For this question:

  • Firstly, yes Async Inference will not response back the actual result because the Async processing option means it will produce the result at backend in longer duration. And then push the output to the output S3 file you have defined.

  • For the Content-Type information, you can actually get that from S3 output object's metadata: https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html. S3 output object will have Content-Type in its metadata and We think for this case, you can use it to determine the json-header-size.

To further understand the issue more in depth as I have limited visibility on your setup, I'd recommend you to reach to AWS Support by creating a support case[+] so that the engineer can investigate further and help you overcome the issue.

Reference: [+] Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create

AWS
답변함 9달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠