in Athena how to declare a piece of generic json inside of known-schema json?

0

I have json-per-line files in S3 that look like this:

{ "eventTimestamp": "...", "eventType": "...", "eventPayload": { complex object of unknown schema } }

How can I define an Athena table that globs the entire eventPayload as a field, without having to define the schema for it?

The goal is to have a schema that can be used for further data manipulation, CTAS, etc, without losing data.

AlexR
demandé il y a 3 mois450 vues
1 réponse
0

Without accounting for performance or optimization, the simplest way to do this would be to run a Glue crawler on the S3 bucket where you're files are stored. Glue crawler will automatically infer the schema of these files and store the column type of "eventPayload" as a struct. You can choose to schedule subsequent crawler runs to continue inferring the schema as new events are created and new fields in your payload will be added to the struct. I have created a sample event and tested this in my AWS account.

I used the following data from two different source files in my S3 bucket with different fields in my event payload: data

The Glue crawler inferred the following schema for my table: Glue table schema

I'm able to extract single fields from event payload with the following query in Athena:

SELECT json_extract(CAST(eventpayload AS JSON), '$.type') AS type
FROM [TABLE_NAME];

Events that have a payload without the field (e.g. type) you query in their payload will return "null".

The table that the Glue crawler populates can further be used to manipulate or transform the data with ETL jobs or directly through Athena.

For more information about which Athena JSON functions are available, see the Athena documentation here. The Glue crawler documentation can be found here.

profile pictureAWS
EXPERT
répondu il y a 3 mois

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions