Athena Icebergs seem to be invalid

0

I wrote an Iceberg table using Athena, and stored it into an S3 bucket. Data files were written using Parquet file format. After downloading it and trying to select the data I wrote into it using pyarrow, it fails. It seems that Athena writes an invalid encoding of data

>>> import pyarrow.parquet as pq
>>> table = pq.read_table('1ef2a2f6-87f2-4ab9-845e-c7e85d68866c.snappy.parquet')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2828, in read_table
    use_pandas_metadata=use_pandas_metadata)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2475, in read
    use_threads=use_threads
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Malformed levels. min: 24 max: 24 out of range.  Max Level: 1

You can reproduce the issue by doing the above with a parquet file written by the statements

CREATE TABLE athena_table (x int) LOCATION 's3://<your-bucket>/<dir>/'
TBLPROPERTIES (
	'table_type' = 'ICEBERG',
	'format' = 'parquet',
	'write_compression' = 'snappy'
);
insert into athena_table values(43),(43),(43),(43),(43),(43),(43),(43);
Diego
質問済み 2年前238ビュー
1回答
0

Hello,

This issue is happening because the parquet file generated through Athena Iceberg is incompatible with the 'pyarrow.parquet' reader. You can consider reading the parquet file generated by Athena Iceberg using PySpark or you can query the data via Athena as well.

AWS
サポートエンジニア
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ