I'm new to glue and I have what I believe to be a very simple transform, but I have not found any examples of how to do it.
I have created a crawler that successfully defines a schema for transactional data I am storing in S3, which runs hourly, 5 minutes after the hour.
I have a table in Aurora which represents hourly aggregations of the transactional data.
In Athena, I can write sql which returns hourly aggregate rows from the S3 transactional data. If I could access Aurora from Athena, I could insert those rows into my hourly aggregate table. The query filters by partition (to restrict results to only rows from the previous hour), groups by specific columns in order to sum or average values in other columns, and uses timezone funtions to convert from UTC to local time.
I have created a glue transform, but only to map some S3 columns to some Aurora columns. I do not have the filtering by partition, group by, sum, average or timezone conversion.
How do I now modify the transform to perform the equivalent of the Athena query and insert those rows into the Aurora table? And can I have the transform trigger off the successful completion of the hourly crawl?
Thanks for any help with this.
Edited by: EdTGuy on Apr 15, 2019 9:02 AM