- 최신
- 최다 투표
- 가장 많은 댓글
Notice that when you "repartition(1)" only one core of the cluster can do work from them all, if you want to just generate a file put the repartition as late as possible (just before the write).
Also bear in mind that when you run the write, that is not running the write but all the work from source to the point it writes (e.g. repartition, filtering, etc), so even if at the end there is no data coming out, it has to do all the work to reach that.
Then you can't move the repartition down further (you could move it after the conversion but I don't think it will make any difference
I even tried to write data frame directly to s3, skipping both the repartitioning and data frame to dynamic frame conversion. But still it was consuming same amount of time -
test_df.write.mode("overwrite").format('json').save(destination_path + '/testing-perf3')
관련 콘텐츠
- AWS 공식업데이트됨 3년 전
- AWS 공식업데이트됨 3년 전
Thanks for the reply !! The above 3 lines are the last 3 lines of the glue job. Do you still have any suggestions in the ordering of these lines ?