1 Resposta
- Mais recentes
- Mais votos
- Mais comentários
0
This is a rate limiting error from S3, your query is probably making too many requests to S3 at the same time. This is usually a sign of your data set being too many small files, tens of thousands or more. Reduce the number of files by combining small files into bigger files.
respondido há 2 anos
Conteúdo relevante
- AWS OFICIALAtualizada há um ano
- AWS OFICIALAtualizada há um ano
I know how to merge text files but not sure how to merge parquet files.
some ideas
If you use your data mostly with Athena or Hive, you could use a CTAS to create a new table and use at bucketing to limit the number of files per Partition. This would obviously apply if your table is already partitioned and filtering on single partitions you can avoid the above error.
Alternatively , you can have a look at this KB article https://aws.amazon.com/premiumsupport/knowledge-center/emr-concatenate-parquet-files/ or external blog post https://medium.com/bigspark/compaction-merge-of-small-parquet-files-bef60847e60b