How does sagmaker batch inference processes individual files?

0

based on the documentation provided here , https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html, large dataset can be structured as shown below in a csv file. is it possible to have multiple files in this format for batch inference, is there any configuration that can be set , for it to process multiple files. Also, what other formats , beside csv can the batch inference handle?

Record1-Attribute1, Record1-Attribute2, Record1-Attribute3, ..., Record1-AttributeM
Record2-Attribute1, Record2-Attribute2, Record2-Attribute3, ..., Record2-AttributeM
...
...
RecordN-Attribute1, RecordN-Attribute2, RecordN-Attribute3, ..., RecordN-AttributeM  
已提問 2 年前檢視次數 528 次
1 個回答
0

If you have multiple files in S3 bucket for Batch Inference, general guidelines is set the number of workers/instances = multiple of number of files in S3 to process. In addition, you can set the BatchStrategy to MultiLine in order to speed up the processing. To enable parallel processing, set the MaxConcurrentTransforms to 0 to start off, Amazon SageMaker checks the optional execution-parameters to determine the settings for your chosen algorithm.

AWS
已回答 2 年前
  • @AWS-Anonymous - thanks. so if i have 2 files then set the number of instances to 2, 4, 6.... performance wise , is it better to have everything in one file , if possible , or split files up into multiple ones. Also, you mentioned "set the MaxConcurrentTransforms to 0 to start off", does this strategy work when we bring our own container, algorithm

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南