Athena Iceberg does not delete orphan files

0

I have Athena Iceberg table. The table has 2 partitions.

Each hour I update it with MERGE and DELETE commands.

SELECT count(*) FROM "my_table$files"

now gives 16. Meanwhile data folder contains 158 files.

None of

VACUUM  my_table

or

OPTIMIZE my_table REWRITE DATA USING BIN_PACK

Is not helping to remove unnecessary files.

Table has following TBLPROPERTIES

TBLPROPERTIES (
  'table_type'='iceberg',
  'vacuum_max_snapshot_age_seconds'='60',
  'write_compression'='ZSTD',
  'format'='parquet',
  'vacuum_max_metadata_files_to_keep'='2',
  'optimize_rewrite_delete_file_threshold'='2',
  'optimize_rewrite_data_file_threshold'='2'
)

It is that aggressive because I do not need any history of changes. I'm interested in the latest state only.

Number of files is keep growing and never decrease, the spite the fact that the number of rows in the table is almost constant.

What I'm doing wrong, and how to stop files inflation.

  • BTW When I manually delete from the data directory anything that is not listed in the files query result I have following error on any random select.

    ICEBERG_CANNOT_OPEN_SPLIT: Error opening Iceberg split s3://my_bucket/data_lake/my_table/data/lOlxRw/20240801_100037_00009_4atvz-e01ed1f7-ec42-4841-a13a-461c597951f4.parquet (offset=0, length=16462): io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist.
    
profile picture
asked 10 months ago708 views
2 Answers
0

First of all seems SELECT count(*) FROM "my_table$files" is showing only DATA files but not DELETE files. Do you have Table Optimization Compaction on? If yes, this optimization is not equal to calling OPTIMIZE. This automatic compaction skips delete files and seems VACUUM never deletes them later.

answered 4 months ago
-1

Thanks for sharing

profile picture
answered 10 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions