Method to output individual records to S3 objects with Glue ?

0

Hi,

I have a list of json records that's been ingested (daily process), data is crawled, I'd like to output the records individually to S3 objects. i.e. for each unique record, create an S3 object. I'd then take that object and use its creation event as a trigger for onward processing. How do I get Glue to output the object for the individual record ?

I guess I need to iterate in PySpark and output each DynamicRecord (?) from the DynamicFrame or is there another option / idea ? any examples on doing that ? Many thanks for any advice.

  • Is there a reason you would like to create an unique S3 object for each unique record? Too many files would not be ideal for storage (S3) or for processing (Glue). The size of each S3 file or number of records depends on the number of executors used by Glue. Glue distributes your computing into multiple executors and each of those would process and write those records. Writing out one record per file is not an ideal usage of Glue functionality

matt
질문됨 2년 전70회 조회
답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠