- Newest
- Most votes
- Most comments
Greetings,
Please note that Batch Transform does not support Parquet files as of now. This is within the road map for the internal team, however I cannot say with certainty when would the feature be implemented. If you need further details or support, I request that you reach out via Support case with the details:
- batch transform job arn
- logs showing the error and how it starts
- inference script or entry_point script for the batch transform
- Dockerfile if you are using your own inference container.
- sample data if possible
The third party link: [1] suggest that the backend input_fn function (inference script) can handle input of parquet format however from what I understand parquet seems to be not supported for Batch transform in SageMaker (you can use CSV or JSON with no issue). I quote the below line from link [2].
The input to batch transforms must be of a format that can be split into smaller files to process in parallel. These formats include CSV, JSON, JSON Lines, TFRecord and RecordIO.
Reference: [1] https://stackoverflow.com/questions/62415237/aws-sagemaker-using-parquet-file-for-batch-transform-job [2] https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-batch-code.html#your-algorithms-batch-code-run-image
Relevant content
- asked 2 years ago
- asked 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago