1 Respuesta
- Más nuevo
- Más votos
- Más comentarios
0
This is a rate limiting error from S3, your query is probably making too many requests to S3 at the same time. This is usually a sign of your data set being too many small files, tens of thousands or more. Reduce the number of files by combining small files into bigger files.
respondido hace 2 años
Contenido relevante
- OFICIAL DE AWSActualizada hace 3 años
- OFICIAL DE AWSActualizada hace un año
I know how to merge text files but not sure how to merge parquet files.
some ideas
If you use your data mostly with Athena or Hive, you could use a CTAS to create a new table and use at bucketing to limit the number of files per Partition. This would obviously apply if your table is already partitioned and filtering on single partitions you can avoid the above error.
Alternatively , you can have a look at this KB article https://aws.amazon.com/premiumsupport/knowledge-center/emr-concatenate-parquet-files/ or external blog post https://medium.com/bigspark/compaction-merge-of-small-parquet-files-bef60847e60b