Athena Icebergs seem to be invalid

0

I wrote an Iceberg table using Athena, and stored it into an S3 bucket. Data files were written using Parquet file format. After downloading it and trying to select the data I wrote into it using pyarrow, it fails. It seems that Athena writes an invalid encoding of data

>>> import pyarrow.parquet as pq
>>> table = pq.read_table('1ef2a2f6-87f2-4ab9-845e-c7e85d68866c.snappy.parquet')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2828, in read_table
    use_pandas_metadata=use_pandas_metadata)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2475, in read
    use_threads=use_threads
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Malformed levels. min: 24 max: 24 out of range.  Max Level: 1

You can reproduce the issue by doing the above with a parquet file written by the statements

CREATE TABLE athena_table (x int) LOCATION 's3://<your-bucket>/<dir>/'
TBLPROPERTIES (
	'table_type' = 'ICEBERG',
	'format' = 'parquet',
	'write_compression' = 'snappy'
);
insert into athena_table values(43),(43),(43),(43),(43),(43),(43),(43);
Diego
已提問 2 年前檢視次數 238 次
1 個回答
0

Hello,

This issue is happening because the parquet file generated through Athena Iceberg is incompatible with the 'pyarrow.parquet' reader. You can consider reading the parquet file generated by Athena Iceberg using PySpark or you can query the data via Athena as well.

AWS
支援工程師
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南