Writing glue dynamic frame to s3 is taking too long

0

Hi I have a glue job running with PySpark. Its taking too long to write the dynamic frame to s3. For around 1200 records writing it too around 500 seconds alone for writing to s3. I have observed that even if data frame is empty still it takes same amount of time to write to s3.

Below are code snippets -

test1_df = test_df.repartition(1)

invoice_extract_final_dyf = DynamicFrame.fromDF(test1_df, glueContext, "invoice_extract_final_dyf")

glueContext.write_dynamic_frame.from_options(frame=invoice_extract_final_dyf, connection_type="s3", connection_options={"path": destination_path}, format="json")

The conversion in 2nd line and writing to s3 both of these consumes most of the time. Any help will be appreciated. Let me know if any further details are needed.

demandé il y a un an2319 vues
2 réponses
1

Notice that when you "repartition(1)" only one core of the cluster can do work from them all, if you want to just generate a file put the repartition as late as possible (just before the write).
Also bear in mind that when you run the write, that is not running the write but all the work from source to the point it writes (e.g. repartition, filtering, etc), so even if at the end there is no data coming out, it has to do all the work to reach that.

profile pictureAWS
EXPERT
répondu il y a un an
  • Thanks for the reply !! The above 3 lines are the last 3 lines of the glue job. Do you still have any suggestions in the ordering of these lines ?

0

Then you can't move the repartition down further (you could move it after the conversion but I don't think it will make any difference

profile pictureAWS
EXPERT
répondu il y a un an
  • I even tried to write data frame directly to s3, skipping both the repartitioning and data frame to dynamic frame conversion. But still it was consuming same amount of time -

    test_df.write.mode("overwrite").format('json').save(destination_path + '/testing-perf3')

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions