Athena Icebergs seem to be invalid

0

I wrote an Iceberg table using Athena, and stored it into an S3 bucket. Data files were written using Parquet file format. After downloading it and trying to select the data I wrote into it using pyarrow, it fails. It seems that Athena writes an invalid encoding of data

>>> import pyarrow.parquet as pq
>>> table = pq.read_table('1ef2a2f6-87f2-4ab9-845e-c7e85d68866c.snappy.parquet')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2828, in read_table
    use_pandas_metadata=use_pandas_metadata)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2475, in read
    use_threads=use_threads
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Malformed levels. min: 24 max: 24 out of range.  Max Level: 1

You can reproduce the issue by doing the above with a parquet file written by the statements

CREATE TABLE athena_table (x int) LOCATION 's3://<your-bucket>/<dir>/'
TBLPROPERTIES (
	'table_type' = 'ICEBERG',
	'format' = 'parquet',
	'write_compression' = 'snappy'
);
insert into athena_table values(43),(43),(43),(43),(43),(43),(43),(43);
Diego
asked 2 years ago234 views
1 Answer
0

Hello,

This issue is happening because the parquet file generated through Athena Iceberg is incompatible with the 'pyarrow.parquet' reader. You can consider reading the parquet file generated by Athena Iceberg using PySpark or you can query the data via Athena as well.

AWS
SUPPORT ENGINEER
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions