I don't believe you have the option to only output a single file when using DataPipeline. You are using a pre-built solution which uses the emr-dynamodb-connector, which limits your ability for customization. You can of course provide your own code to DataPipeline in which you can achieve your goal of a single file output.
You could use AWS Glue to achieve this using Spark, and before you write the data to S3 you call
coalesce to reduce to a single partition. If you have understanding of Hadoop or Spark you will understand that reducing the partitions reduces the distribution of the job to essentially a single reducer. This can lead to issues if the table has a lot of data, as a single node in your cluster will need to hold the entire contents of the table, leading to Storage or OOM issues.
Data Pipeline and IAM errorsasked a year ago
AWS Data Pipeline stuck in state DELETINGasked 3 years ago
How to offload historical data to Redshift or S3?Accepted Answerasked 2 years ago
How to merge aws data pipeline output files into a single file?asked 6 months ago
Multiple tables in one Data Pipeline?Accepted Answerasked 2 years ago
AWS glue combining multiple input into a single output csvasked 7 months ago
Can we export the mongodb using the data pipeline?Accepted Answerasked 7 years ago
Data Pipeline stops processing files in S3 bucketAccepted Answerasked 10 months ago
Using AWS Lambda to run Python script, how can I save data?Accepted Answerasked 3 years ago
Creating Data Pipeline using Templateasked 5 months ago