Method to output individual records to S3 objects with Glue ?

0

Hi,

I have a list of json records that's been ingested (daily process), data is crawled, I'd like to output the records individually to S3 objects. i.e. for each unique record, create an S3 object. I'd then take that object and use its creation event as a trigger for onward processing. How do I get Glue to output the object for the individual record ?

I guess I need to iterate in PySpark and output each DynamicRecord (?) from the DynamicFrame or is there another option / idea ? any examples on doing that ? Many thanks for any advice.

  • Is there a reason you would like to create an unique S3 object for each unique record? Too many files would not be ideal for storage (S3) or for processing (Glue). The size of each S3 file or number of records depends on the number of executors used by Glue. Glue distributes your computing into multiple executors and each of those would process and write those records. Writing out one record per file is not an ideal usage of Glue functionality

matt
demandé il y a 2 ans70 vues
Aucune réponse

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions