Skip to content

Creating Athena View on S3 Iceberg Table containing arrays

0

I am using an AWS S3 (Iceberg) Table, I create it using EMR Spark for example:

CREATE TABLE s3tablesbucket.my_schema.my_table(
    id STRING,
    segments ARRAY<INT>,
    dt DATE,
)
USING iceberg
PARTITIONED BY (
  dt,
  bucket(16, id)
);

I can query this query in athena into a glue database that I created using the aws cli:

aws glue create-database --region us-west-2 --catalog-id "111111111" --database-input \
'{
  "Name": "athena_db",
  "TargetDatabase": {
    "CatalogId": "111111111:s3tablescatalog/my-bucket",
    "DatabaseName": "my_schema"
  },
  "CreateTableDefaultPermissions": []
}'

The table in Athena can be successfully queried in athena with:

SELECT id, segments
FROM athena_db.my_table

But when I try to create a view on this table in Athena with:

CREATE VIEW my_view AS
SELECT id, segments
FROM athena_db.my_table

I am getting the error:

Error: type expected at the position 0 of 'list<int>' but 'list' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null)

It seems that Athena (Trino) have an issue with the list column here. Is this a known bug or am I doing something wrong?

asked a year ago587 views
1 Answer
-1

The issue you're encountering appears to be related to how Athena handles complex data types like arrays when creating views on Iceberg tables. This is likely due to a mismatch in how the data type is interpreted between the Iceberg table format and Athena's view creation process.

To work around this issue, you can try explicitly casting the array column in your view definition. Instead of directly selecting the 'segments' column, you can cast it to an array of integers. Here's a modified version of your view creation statement that might resolve the error:

CREATE VIEW my_view AS
SELECT id, CAST(segments AS ARRAY(INTEGER)) AS segments
FROM athena_db.my_table

This explicit casting should help Athena correctly interpret the array type when creating the view.

If this approach doesn't work, you might need to consider alternative solutions:

  1. Instead of creating a view, you could create a new Iceberg table using a CTAS (CREATE TABLE AS SELECT) statement, which might handle the array type more accurately.

  2. You could flatten the array in your view definition if you don't need to preserve the array structure.

  3. If possible, update your table schema to use a different representation for the 'segments' column that Athena can handle more easily in views.

It's worth noting that support for complex types in Athena, especially when working with Iceberg tables and views, can sometimes be subject to limitations or ongoing improvements. If the issue persists after trying these workarounds, you may want to contact AWS support for further assistance or to report a potential bug.
Sources
Query Iceberg table data - Amazon Athena
Create Iceberg tables - Amazon Athena
Getting started with Apache Iceberg tables in Amazon Athena SQL - AWS Prescriptive Guidance

answered a year ago
  • CREATE VIEW my_view AS
    SELECT id, CAST(segments AS ARRAY(INTEGER)) AS segments
    FROM athena_db.my_table
    

    This is not working either, in fact even a view without specifying the array column is not working such as:

    CREATE VIEW my_view AS
    SELECT id
    FROM athena_db.my_table
    

    It seems there is an issue while parsing the metadata of the table itself?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.