How to insert S3 data into Aurora table via glue transform?

0

I'm new to glue and I have what I believe to be a very simple transform, but I have not found any examples of how to do it.
I have created a crawler that successfully defines a schema for transactional data I am storing in S3, which runs hourly, 5 minutes after the hour.
I have a table in Aurora which represents hourly aggregations of the transactional data.
In Athena, I can write sql which returns hourly aggregate rows from the S3 transactional data. If I could access Aurora from Athena, I could insert those rows into my hourly aggregate table. The query filters by partition (to restrict results to only rows from the previous hour), groups by specific columns in order to sum or average values in other columns, and uses timezone funtions to convert from UTC to local time.
I have created a glue transform, but only to map some S3 columns to some Aurora columns. I do not have the filtering by partition, group by, sum, average or timezone conversion.
How do I now modify the transform to perform the equivalent of the Athena query and insert those rows into the Aurora table? And can I have the transform trigger off the successful completion of the hourly crawl?
Thanks for any help with this.

Edited by: EdTGuy on Apr 15, 2019 9:02 AM

EdTGuy
已提问 5 年前429 查看次数
1 回答
0

I'm going to answer my own question. At least for our purposes, glue is a complete waste of time. The crawl alone runs into the tens of minutes. Instead, writing some javascript in nodejs, using the aws-sdk package, I have an app that I can schedule to run hourly, that retrieves the transactional data from S3, transforms it and writes it to Aurora in seconds. Perhaps I overlooked some performance tuning in setting up the glue crawler, or perhaps it is a new technology that has not yet matured in terms of performance, but for now, it is not suitable to our needs.

EdTGuy
已回答 5 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则