- 최신
- 최다 투표
- 가장 많은 댓글
You were not able to see the whole value because when working with nested arrays, you often need to expand nested array elements into a single array, or expand the array into multiple rows. Amazon Athena involves using the ‘UNNEST’ function to break down arrays or structs into separate rows and columns. You need to unnest nested JSON data to transform it into a tabular format for easier querying and analysis. By unnesting, you convert the nested data into a format that Athena can work with, it will allow you to perform SQL queries and aggregations more easily and it also creates compatibility with many BI tools.
To flatten an array into multiple rows, use CROSS JOIN in conjunction with the UNNEST operator, as in this example:
WITH dataset AS (
SELECT
'engineering' as department,
ARRAY['Sharon', 'John', 'Bob', 'Sally'] as users
)
SELECT department, names FROM dataset
CROSS JOIN UNNEST(users) as t(names)
So the steps to flatten nested Jason data in Athena:
-
Create a Table: Ensure that you have a table in Athena that represents your JSON data.
-
Use ‘UNEST’ for Arrays. You can use the ‘UNNEST’ function to expand them into separate rows.
Example: SELECT id, value FROM my_table CROSS JOIN UNNEST(array_column) AS t(value) -
Use ‘UNEST’ for Structs: if you have nested structs, you can use the ‘UNEST’ function to flatten them into separate columns. Example: SELECT id, struct_column.field1, struct_column.field2 FROM my_table CROSS JOIN UNNEST(struct_column) AS t(field1, field2)
-
Filter Data: If needed, you can apply filters to select specific data or remove rows with null values.
For more information please refer the links below: https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html https://aws.amazon.com/blogs/big-data/analyze-and-visualize-nested-json-data-with-amazon-athena-and-amazon-quicksight/
Hope it helps.
Here's how I achieved the solution for the above. Using the Glue crawler, created the Athena database. From there, created the table first
Created the Table CREATE EXTERNAL TABLE IF NOT EXISTS s3_Qualtrics_logs_db.Shiva_Q9 ( Channel VARCHAR(50), ConnectedToSystemTimestamp TIMESTAMP, ContactId VARCHAR(60), Attributes ARRAY< struct< AgentName:VARCHAR(50), Q1:VARCHAR(10), Q2:VARCHAR(10), Q3:VARCHAR(10), Q4:VARCHAR(10), Q5:VARCHAR(10), Q6:VARCHAR(255), CountryCode:VARCHAR(25)>> )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1' ) LOCATION 's3://connect/QualtricsSample/';
Then able to query the table
SELECT channel, contactid, attributes[1].agentname AgentName, attributes[1].q1 Q1, attributes[1].q2 Q2, attributes[1].q3 Q3, attributes[1].q4 Q4, attributes[1].q5 Q5, attributes[1].q6 Q6, attributes[1].countrycode CountryCode FROM s3_qualtrics_logs_db.shiva_q9 limit 20;
관련 콘텐츠
- AWS 공식업데이트됨 일 년 전