- 新しい順
- 投票が多い順
- コメントが多い順
Notice that when you "repartition(1)" only one core of the cluster can do work from them all, if you want to just generate a file put the repartition as late as possible (just before the write).
Also bear in mind that when you run the write, that is not running the write but all the work from source to the point it writes (e.g. repartition, filtering, etc), so even if at the end there is no data coming out, it has to do all the work to reach that.
Then you can't move the repartition down further (you could move it after the conversion but I don't think it will make any difference
I even tried to write data frame directly to s3, skipping both the repartitioning and data frame to dynamic frame conversion. But still it was consuming same amount of time -
test_df.write.mode("overwrite").format('json').save(destination_path + '/testing-perf3')
関連するコンテンツ
- AWS公式更新しました 3年前
- AWS公式更新しました 3年前
Thanks for the reply !! The above 3 lines are the last 3 lines of the glue job. Do you still have any suggestions in the ordering of these lines ?