Method to output individual records to S3 objects with Glue ?

0

Hi,

I have a list of json records that's been ingested (daily process), data is crawled, I'd like to output the records individually to S3 objects. i.e. for each unique record, create an S3 object. I'd then take that object and use its creation event as a trigger for onward processing. How do I get Glue to output the object for the individual record ?

I guess I need to iterate in PySpark and output each DynamicRecord (?) from the DynamicFrame or is there another option / idea ? any examples on doing that ? Many thanks for any advice.

  • Is there a reason you would like to create an unique S3 object for each unique record? Too many files would not be ideal for storage (S3) or for processing (Glue). The size of each S3 file or number of records depends on the number of executors used by Glue. Glue distributes your computing into multiple executors and each of those would process and write those records. Writing out one record per file is not an ideal usage of Glue functionality

matt
質問済み 2年前70ビュー
回答なし

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ