How to insert S3 data into Aurora table via glue transform?

Question

I'm new to glue and I have what I believe to be a very simple transform, but I have not found any examples of how to do it.  
I have created a crawler that successfully defines a schema for transactional data I am storing in S3, which runs hourly, 5 minutes after the hour.  
I have a table in Aurora which represents hourly aggregations of the transactional data.  
In Athena, I can write sql which returns hourly aggregate rows from the S3 transactional data. If I could access Aurora from Athena, I could insert those rows into my hourly aggregate table. The query filters by partition (to restrict results to only rows from the previous hour), groups by specific columns in order to sum or average values in other columns, and uses timezone funtions to convert from UTC to local time.  
I have created a glue transform, but only to map some S3 columns to some Aurora columns. I do not have the filtering by partition, group by, sum, average or timezone conversion.  
How do I now modify the transform to perform the equivalent of the Athena query and insert those rows into the Aurora table? And can I have the transform trigger off the successful completion of the hourly crawl?  
Thanks for any help with this.  
  
Edited by: EdTGuy on Apr 15, 2019 9:02 AM

Answer

I'm going to answer my own question. At least for our purposes, glue is a complete waste of time. The crawl alone runs into the tens of minutes. Instead, writing some javascript in nodejs, using the aws-sdk package, I have an app that I can schedule to run hourly, that retrieves the transactional data from S3, transforms it and writes it to Aurora in seconds. Perhaps I overlooked some performance tuning in setting up the glue crawler, or perhaps it is a new technology that has not yet matured in terms of performance, but for now, it is not suitable to our needs.

How to insert S3 data into Aurora table via glue transform?

相關內容