Method to output individual records to S3 objects with Glue ?

0

Hi,

I have a list of json records that's been ingested (daily process), data is crawled, I'd like to output the records individually to S3 objects. i.e. for each unique record, create an S3 object. I'd then take that object and use its creation event as a trigger for onward processing. How do I get Glue to output the object for the individual record ?

I guess I need to iterate in PySpark and output each DynamicRecord (?) from the DynamicFrame or is there another option / idea ? any examples on doing that ? Many thanks for any advice.

  • Is there a reason you would like to create an unique S3 object for each unique record? Too many files would not be ideal for storage (S3) or for processing (Glue). The size of each S3 file or number of records depends on the number of executors used by Glue. Glue distributes your computing into multiple executors and each of those would process and write those records. Writing out one record per file is not an ideal usage of Glue functionality

matt
asked 2 years ago68 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions