1 Answers
0
Hi , could you please clarify:
-
is it AWS Glue or Glue DataBrew?
-
can you describe better the flow in the Job?
When speaking about combining multiple files and having a single output, there might be multiple methods:
- joining the input files (if the files have a common key and they want to combine the fields of both files)
- using a union (if they have the same schema and they just want to append a file to another)
obviously after they have combined the files they will be able to use a single Target node to write out. AWS Glue and Glue DataBrew, are running on top of Spark, so the output even of a single target would be split in multiple files, if they need one single file:
- in AWS Glue they could just add a step to reduce the number of Spark partitions to 1 using the function coalesce(1)
- In DataBrew, unless it is recently changed, you do not have this option please check this other question/answer.
hope this helps.
Relevant questions
How to escape a comma in a csv file in AWS Glue?
Accepted Answerxgboost sagemaker batch transform job output in multiple lines
Accepted Answerasked 2 years agoAWS Glue RDS to single file CSV
asked 6 months agoHow to merge aws data pipeline output files into a single file?
asked 4 months agoGlue processing a csv
Accepted Answerasked 3 years agoAWS Glue - Map against a template
asked 2 months agoAWS glue combining multiple input into a single output csv
asked 6 months agoHow to keep the source file name in the target output file with a AWS Glue job
Accepted Answerasked 2 years agoHow do I get the output of an AWS Glue DataBrew job to be a single CSV file?
Accepted Answerasked 2 years agoGlue Studio Job - How to add the file name or path as a Source Key output or partition
Accepted Answerasked 4 months ago