1 Answer
- Newest
- Most votes
- Most comments
0
I would strongly recommend splitting the jobs if thats an option. It is not recommended to have a single big job needing lot of DPUs. For my example, I needed 700 DPUs to convert 14000 files of each 500 MB CSV and gzipped to parquet. I learnt that the best way to do this in Glue will be to split into 14 instances of the same Spark job with each job instance processing 1000 files using 50 DPUs per job instance. Basically, try to split the jobs IF doable and if you can't, you need a lot of DPUs, it might be better to look for a transient EMR cluster.
answered 6 years ago
Relevant content
- asked 3 months ago
- asked 2 years ago
- Accepted Answerasked 6 years ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago