- Newest
- Most votes
- Most comments
Without accounting for performance or optimization, the simplest way to do this would be to run a Glue crawler on the S3 bucket where you're files are stored. Glue crawler will automatically infer the schema of these files and store the column type of "eventPayload" as a struct. You can choose to schedule subsequent crawler runs to continue inferring the schema as new events are created and new fields in your payload will be added to the struct. I have created a sample event and tested this in my AWS account.
I used the following data from two different source files in my S3 bucket with different fields in my event payload:
The Glue crawler inferred the following schema for my table:
I'm able to extract single fields from event payload with the following query in Athena:
SELECT json_extract(CAST(eventpayload AS JSON), '$.type') AS type
FROM [TABLE_NAME];
Events that have a payload without the field (e.g. type) you query in their payload will return "null".
The table that the Glue crawler populates can further be used to manipulate or transform the data with ETL jobs or directly through Athena.
For more information about which Athena JSON functions are available, see the Athena documentation here. The Glue crawler documentation can be found here.
Relevant content
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago