Output from Athena to RDS

0

I need to build a solution that for a given Athena query, the output is inserted into RDS (Postgres). From what I have read, it makes sense to use Glue. However, I am not comprehending the exact integration. Each query run would create a respective output that is supposed to get to RDS. Would that be Athena creating a table in Glue? However, it would rather new records needed vs a new table each time.

Przemek
asked 7 months ago211 views
2 Answers
0

Hello,

To be able store the results of a query run in Amazon Athena to a RDS table you can surely make use of the AWS Glue service. In a Glue Job, you can run an Athena query to read from a catalog table using 'awswrangler' and store the results into a DataFrame like below:

import awswrangler as wr
# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")

The data in this dataframe 'df' can then be written into RDS either using awswrangler or using pyspark write. Please refer the below references:

[+] awswrangler: https://pypi.org/project/awswrangler/ [+] Write records stored in a DataFrame into MySQL using awswrangler: https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.mysql.to_sql.html#awswrangler.mysql.to_sql [+] Write PySpark DataFrame to MySQL Database Table: https://sparkbyexamples.com/pyspark/pyspark-read-and-write-mysql-database-table/

Please note that both pyspark write and awswrangler provide multiple write modes such as append, overwrite, etc. As per your use-case you can either append the records to the already existing data in the table or overwrite the entire dataset.

Would that be Athena creating a table in Glue? --> No, this would not be required while writing to a JDBC source like RDS.

AWS
Anu_C
answered 7 months ago
0

Unless you are talking about a big complex query that SparkSQL cannot run as efficiently, I would do both things in Glue. Using both tools at the same time is not ideal both in terms of integration and efficiency (while Athena runs the query, Glue is just waiting).
You don't specify what you are querying, maybe you can use Glue bookmarks to do incremental updates.

An easier way to integrate would be having a lambda calling Athena and then starting a Glue job, passing the s3 output where s3 has stored the query results.

profile pictureAWS
EXPERT
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions