- Più recenti
- Maggior numero di voti
- Maggior numero di commenti
Hello Dave,
From the AWS documentation they mentioned that you can control the size of the mini-batches by using the BatchStrategy and MaxPayloadInMB parameters [1]. With BatchStrategy you can specify the number of records to include in a mini-batch for an inference request [2]. Where each record is a single unit of input data that inference can be made on. For example, a single line in a CSV file is a record. If the input data is very large you can set the value of the MaxPayloadInMB to 0 to stream the data to the algorithm. However, this feature is not supported for Amazon SageMaker built-in algorithms.
Moreover, to split input files into mini-batches when you create a batch transform job, set the SplitType parameter value to Line [1]. If SplitType is set to None or if an input file can't be split into mini-batches, SageMaker uses the entire input file in a single request. Please note that Batch Transform doesn't support CSV-formatted input that contains embedded newline characters. If you set SplitType to Line, you can then set the AssembleWith parameter to Line to concatenate the output records with a line delimiter. Thus, you do not need to manually reassemble the output. If you don't specify the AssembleWith parameter, by default the output records are concatenated in a binary format.
I hope that this information will be helpful.
References:
Contenuto pertinente
- AWS UFFICIALEAggiornata 4 mesi fa
- AWS UFFICIALEAggiornata 2 anni fa