in Athena how to declare a piece of generic json inside of known-schema json?

0

I have json-per-line files in S3 that look like this:

{ "eventTimestamp": "...", "eventType": "...", "eventPayload": { complex object of unknown schema } }

How can I define an Athena table that globs the entire eventPayload as a field, without having to define the schema for it?

The goal is to have a schema that can be used for further data manipulation, CTAS, etc, without losing data.

AlexR
asked 3 months ago438 views
1 Answer
0

Without accounting for performance or optimization, the simplest way to do this would be to run a Glue crawler on the S3 bucket where you're files are stored. Glue crawler will automatically infer the schema of these files and store the column type of "eventPayload" as a struct. You can choose to schedule subsequent crawler runs to continue inferring the schema as new events are created and new fields in your payload will be added to the struct. I have created a sample event and tested this in my AWS account.

I used the following data from two different source files in my S3 bucket with different fields in my event payload: data

The Glue crawler inferred the following schema for my table: Glue table schema

I'm able to extract single fields from event payload with the following query in Athena:

SELECT json_extract(CAST(eventpayload AS JSON), '$.type') AS type
FROM [TABLE_NAME];

Events that have a payload without the field (e.g. type) you query in their payload will return "null".

The table that the Glue crawler populates can further be used to manipulate or transform the data with ETL jobs or directly through Athena.

For more information about which Athena JSON functions are available, see the Athena documentation here. The Glue crawler documentation can be found here.

profile pictureAWS
EXPERT
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions