How to insert S3 data into Aurora table via glue transform?

0

I'm new to glue and I have what I believe to be a very simple transform, but I have not found any examples of how to do it.
I have created a crawler that successfully defines a schema for transactional data I am storing in S3, which runs hourly, 5 minutes after the hour.
I have a table in Aurora which represents hourly aggregations of the transactional data.
In Athena, I can write sql which returns hourly aggregate rows from the S3 transactional data. If I could access Aurora from Athena, I could insert those rows into my hourly aggregate table. The query filters by partition (to restrict results to only rows from the previous hour), groups by specific columns in order to sum or average values in other columns, and uses timezone funtions to convert from UTC to local time.
I have created a glue transform, but only to map some S3 columns to some Aurora columns. I do not have the filtering by partition, group by, sum, average or timezone conversion.
How do I now modify the transform to perform the equivalent of the Athena query and insert those rows into the Aurora table? And can I have the transform trigger off the successful completion of the hourly crawl?
Thanks for any help with this.

Edited by: EdTGuy on Apr 15, 2019 9:02 AM

EdTGuy
已提問 5 年前檢視次數 439 次
1 個回答
0

I'm going to answer my own question. At least for our purposes, glue is a complete waste of time. The crawl alone runs into the tens of minutes. Instead, writing some javascript in nodejs, using the aws-sdk package, I have an app that I can schedule to run hourly, that retrieves the transactional data from S3, transforms it and writes it to Aurora in seconds. Perhaps I overlooked some performance tuning in setting up the glue crawler, or perhaps it is a new technology that has not yet matured in terms of performance, but for now, it is not suitable to our needs.

EdTGuy
已回答 5 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南