Athena Icebergs seem to be invalid

0

I wrote an Iceberg table using Athena, and stored it into an S3 bucket. Data files were written using Parquet file format. After downloading it and trying to select the data I wrote into it using pyarrow, it fails. It seems that Athena writes an invalid encoding of data

>>> import pyarrow.parquet as pq
>>> table = pq.read_table('1ef2a2f6-87f2-4ab9-845e-c7e85d68866c.snappy.parquet')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2828, in read_table
    use_pandas_metadata=use_pandas_metadata)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2475, in read
    use_threads=use_threads
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: Malformed levels. min: 24 max: 24 out of range.  Max Level: 1

You can reproduce the issue by doing the above with a parquet file written by the statements

CREATE TABLE athena_table (x int) LOCATION 's3://<your-bucket>/<dir>/'
TBLPROPERTIES (
	'table_type' = 'ICEBERG',
	'format' = 'parquet',
	'write_compression' = 'snappy'
);
insert into athena_table values(43),(43),(43),(43),(43),(43),(43),(43);
Diego
feita há 2 anos238 visualizações
1 Resposta
0

Hello,

This issue is happening because the parquet file generated through Athena Iceberg is incompatible with the 'pyarrow.parquet' reader. You can consider reading the parquet file generated by Athena Iceberg using PySpark or you can query the data via Athena as well.

AWS
ENGENHEIRO DE SUPORTE
respondido há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas