Athena engine v3, errors in VIEW with UNNEST

1

It seems that recent version of Athena engine (v3) has severe inconsistency with UNNEST ARRAY of structs. Moreover, it was not documented among breaking changes at https://docs.aws.amazon.com/athena/latest/ug/engine-versions-reference-0003.html. To reproduce:

WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users
)
SELECT t.* FROM dataset CROSS JOIN UNNEST(users) t(s);
-- OK
CREATE OR REPLACE VIEW v_test AS 
WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users
)
SELECT t.* FROM dataset CROSS JOIN UNNEST(users) t(s);
-- line 9:36: Column alias list has 1 entries but 't' has 2 columns available
CREATE OR REPLACE VIEW v_test AS 
WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users
)
SELECT t.* FROM dataset CROSS JOIN UNNEST(users) t(name,age);
-- OK
select * from v_test
-- INVALID_VIEW: line 1:15: Failed analyzing stored view: line 9:12: Column alias list has 2 entries but 't' has 1 columns available

As opposite to that, in Athena engine v2 , the "single column syntax" t(s) works consistently in both CREATE VIEW, SELECT FROM VIEW and immediate SELECT. In current Trino (v. 402), the "multiple column syntax" t(name,age) works consistently in both CREATE VIEW, SELECT FROM VIEW and immediate SELECT.

These errors prevent us to migrate to Athena engine v3. How could it be solved? Thank you in advance

asked a year ago570 views
1 Answer
0

Hello, This is known issue in the Athena Engine version 3 with the creation of the views only, as explained in your example. For now the workaround is to Use Engine version 2.

"Multiple column syntaxes" with unnest() do work with Athena also but in your example you are only using unnest() on a single column "users". See below example which shows correct way of opening multiple columns with unnest():

WITH dataset AS (
  SELECT ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users ,
ARRAY[
    CAST(ROW('Bob', 38) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Alice', 35) AS ROW(name VARCHAR, age INTEGER)),
    CAST(ROW('Jane', 27) AS ROW(name VARCHAR, age INTEGER))
  ] AS users1
)
SELECT t.* FROM dataset CROSS JOIN UNNEST(users,users1) t(s,t)
AWS
SUPPORT ENGINEER
Shubh
answered a year ago
  • Hello Do you have any plans on improving this behavior? Cause my team have the very same situation - we can't upgrade until there is a way to use unnest over an array of structs

    Thanks!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions