- Newest
- Most votes
- Most comments
Hello,
To be able store the results of a query run in Amazon Athena to a RDS table you can surely make use of the AWS Glue service. In a Glue Job, you can run an Athena query to read from a catalog table using 'awswrangler' and store the results into a DataFrame like below:
import awswrangler as wr
# Retrieving the data from Amazon Athena
df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db")
The data in this dataframe 'df' can then be written into RDS either using awswrangler or using pyspark write. Please refer the below references:
[+] awswrangler: https://pypi.org/project/awswrangler/ [+] Write records stored in a DataFrame into MySQL using awswrangler: https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.mysql.to_sql.html#awswrangler.mysql.to_sql [+] Write PySpark DataFrame to MySQL Database Table: https://sparkbyexamples.com/pyspark/pyspark-read-and-write-mysql-database-table/
Please note that both pyspark write and awswrangler provide multiple write modes such as append, overwrite, etc. As per your use-case you can either append the records to the already existing data in the table or overwrite the entire dataset.
Would that be Athena creating a table in Glue? --> No, this would not be required while writing to a JDBC source like RDS.
Unless you are talking about a big complex query that SparkSQL cannot run as efficiently, I would do both things in Glue. Using both tools at the same time is not ideal both in terms of integration and efficiency (while Athena runs the query, Glue is just waiting).
You don't specify what you are querying, maybe you can use Glue bookmarks to do incremental updates.
An easier way to integrate would be having a lambda calling Athena and then starting a Glue job, passing the s3 output where s3 has stored the query results.
Relevant content
- asked 10 months ago
- Accepted Answerasked 5 months ago
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 months ago