By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Athena

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Athena Icebergs seem to be invalid

I wrote an Iceberg table using Athena, and stored it into an S3 bucket. Data files were written using Parquet file format. After downloading it and trying to select the data I wrote into it using pyarrow, it fails. It seems that Athena writes an invalid encoding of data ``` >>> import pyarrow.parquet as pq >>> table = pq.read_table('1ef2a2f6-87f2-4ab9-845e-c7e85d68866c.snappy.parquet') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2828, in read_table use_pandas_metadata=use_pandas_metadata) File "/home/ec2-user/.local/lib/python3.7/site-packages/pyarrow/parquet/__init__.py", line 2475, in read use_threads=use_threads File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: Malformed levels. min: 24 max: 24 out of range. Max Level: 1 ``` You can reproduce the issue by doing the above with a parquet file written by the statements ``` CREATE TABLE athena_table (x int) LOCATION 's3://<your-bucket>/<dir>/' TBLPROPERTIES ( 'table_type' = 'ICEBERG', 'format' = 'parquet', 'write_compression' = 'snappy' ); insert into athena_table values(43),(43),(43),(43),(43),(43),(43),(43); ```
1
answers
0
votes
43
views
asked a month ago

Data Mesh on AWS Lake Formation

Hi, I'm building a data mesh in AWS Lake Formation. The idea is to have 4 accounts: account 0: main account account 1: central data governance account 2: data producer account 3: data consumer I have been looking for information about how to implement the mesh in AWS and I'm following some tutorials that are very similar to what I'm doing: https://catalog.us-east-1.prod.workshops.aws/workshops/78572df7-d2ee-4f78-b698-7cafdb55135d/en-US/lakeformation-basics/cross-account-data-mesh https://aws.amazon.com/blogs/big-data/design-a-data-mesh-architecture-using-aws-lake-formation-and-aws-glue/ https://aws.amazon.com/blogs/big-data/build-a-data-sharing-workflow-with-aws-lake-formation-for-your-data-mesh/ However, after having created the bucket and uploaded some csv data to it (in the producer account), I don't know if I have to register first to the glue catalog in the producer account or I just do it in the lake formation like it says here: https://catalog.us-east-1.prod.workshops.aws/workshops/78572df7-d2ee-4f78-b698-7cafdb55135d/en-US/lakeformation-basics/databases (is this dependant on if one uses glue permissions or lake formation permissions in lake formation configuration?) Indeed I have done it first the database and the table in glue and then when I go to lake formation in the database and table sections the database and table created from glue appear there without doing anything. Even if I disable there the options: "Use only IAM access control for new databases" "Use only IAM access control for new tables in new databases" both the database and table appear there do you know if glue and lake formations share the data catalog? and I'm doing it correctly? thanks, John
1
answers
0
votes
50
views
asked a month ago
0
answers
0
votes
35
views
asked a month ago
1
answers
0
votes
48
views
asked 2 months ago