2 Answers
- Newest
- Most votes
- Most comments
0
Hi ,
some additional information on the S3 structure, the Athena DDL , and the glue Job and how it implements the overwrite insert would be needed to correctly answer the question.
The behaviour you describe seems to point to additional files or partitions being present in the Athena table location.
0
Looks like all the new partitions are added to the table. You should drop older partitions if you don't want to have duplicates.
Relevant content
- AWS OFFICIALUpdated 3 months ago
S3 structure: CSV file with | delimiter DDL: Table is created with input format as textinputformat and outputformat as HiveIgnoreKeyTextOutputFormat along with table properties having delimiter as |. Glue job: It is a pyspark script which reads data from one S3 file convert it into dataframe , add a partition column and store it in another S3 bucket. After storing partition is added manually to Athena table using Alter table query. There are no multiple files under each partitions.