AWS glue combining multiple input into a single output csv

0

I have a AWS Glue customer doing POC and found that each output job creates a separate output csv file. They want to be able to create a single csv output file combining multiple input jobs or files. Any Guidance?

1개 답변
0

Hi , could you please clarify:

  1. is it AWS Glue or Glue DataBrew?

  2. can you describe better the flow in the Job?

When speaking about combining multiple files and having a single output, there might be multiple methods:

  1. joining the input files (if the files have a common key and they want to combine the fields of both files)
  2. using a union (if they have the same schema and they just want to append a file to another)

obviously after they have combined the files they will be able to use a single Target node to write out. AWS Glue and Glue DataBrew, are running on top of Spark, so the output even of a single target would be split in multiple files, if they need one single file:

  1. in AWS Glue they could just add a step to reduce the number of Spark partitions to 1 using the function coalesce(1)
  2. In DataBrew, unless it is recently changed, you do not have this option please check this other question/answer.

hope this helps.

AWS
전문가
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠