Athena Icebergs seem to be invalid


I wrote an Iceberg table using Athena, and stored it into an S3 bucket. Data files were written using Parquet file format. After downloading it and trying to select the data I wrote into it using pyarrow, it fails. It seems that Athena writes an invalid encoding of data

>>> import pyarrow.parquet as pq
>>> table = pq.read_table('1ef2a2f6-87f2-4ab9-845e-c7e85d68866c.snappy.parquet')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/", line 2828, in read_table
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/", line 2475, in read
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Malformed levels. min: 24 max: 24 out of range.  Max Level: 1

You can reproduce the issue by doing the above with a parquet file written by the statements

CREATE TABLE athena_table (x int) LOCATION 's3://<your-bucket>/<dir>/'
	'table_type' = 'ICEBERG',
	'format' = 'parquet',
	'write_compression' = 'snappy'
insert into athena_table values(43),(43),(43),(43),(43),(43),(43),(43);
asked 2 years ago264 views
1 Answer


This issue is happening because the parquet file generated through Athena Iceberg is incompatible with the 'pyarrow.parquet' reader. You can consider reading the parquet file generated by Athena Iceberg using PySpark or you can query the data via Athena as well.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions