1 réponse
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
This is a rate limiting error from S3, your query is probably making too many requests to S3 at the same time. This is usually a sign of your data set being too many small files, tens of thousands or more. Reduce the number of files by combining small files into bigger files.
répondu il y a 2 ans
Contenus pertinents
- demandé il y a 6 mois
- Réponse acceptéedemandé il y a 7 mois
- demandé il y a 2 mois
- AWS OFFICIELA mis à jour il y a un an
I know how to merge text files but not sure how to merge parquet files.
some ideas
If you use your data mostly with Athena or Hive, you could use a CTAS to create a new table and use at bucketing to limit the number of files per Partition. This would obviously apply if your table is already partitioned and filtering on single partitions you can avoid the above error.
Alternatively , you can have a look at this KB article https://aws.amazon.com/premiumsupport/knowledge-center/emr-concatenate-parquet-files/ or external blog post https://medium.com/bigspark/compaction-merge-of-small-parquet-files-bef60847e60b