- Newest
- Most votes
- Most comments
Hello Rohith,
Thank you raising this question in re:Post.
To give you an accurate answer to this, I would need more details on what you are using to process this file, how you are writing it to the target, what is your target etc.
In general, non-text file output formats(Parquet as an example) are known to have better handling of this scenario in application frameworks like Spark & Hive, among others. While Text file formats are prone to leave partial files at the target during abrupt exit or failure of the application.
On a high level I believe if you write an application to overwrite the target, it should take care of this and not leave any duplicates. Having said that there are multiple variables at play here, hence I'll not be able to confirm if it would be the case for your use case.
Please elaborate more on the output file format, the execution engine(Hive/Spark etc.), target location(RDBMS/S3 or something else) you would be using so that I could review this again and answer your questions. The more details you share the easier it would be to answer.
Relevant content
- asked 4 years ago
- asked 3 years ago
- asked 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago

Output format will be parquet and target location can be S3 or Redshift. Execution engine is Spark