How to upload a parquet format file of RDS table data to S3 without using snapshots

0

I'm trying to send a copy of an AWS RDS table data to an external cloud storage service to do some analysis. (The reason I'm using the storage service is because I want to use it as a data lake.) I prefer Parquet to CSV as the file format of the copied data . (Parquet is preferred over CSV because it is easier to transform.) I think snapshot of RDS is the best solution, but snapshot can't specify table columns, and it outputs columns that should be masked as they are, so there is a security concern.

So, I would like to know if there is a way to output RDS table data to S3 as a Parquet file without using snapshot. Thank you very much for your help.

  • Yasshi-Cookie, let us know if this answers your question. If this solved your issue, please remember to click on the "Accept" button to let the community know that your question is resolved. This helps everyone. Thank you in advance. Thank you for using re:Post.

已提問 2 年前檢視次數 3577 次
2 個答案
0

Please take a look at this https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html, You can selectively send data to an s3 file, However this is text/csv file, You can have a lambda that will trigger when a file is generated and that will convert your text/csv file to parquet format.

AWS
已回答 2 年前
  • Thank you for your answer. I think it is a practical way. However, it is a pity that there is no way to export directly in Parquet format.

0

In addition to the Save to S3 option, you can also write a Glue job to query RDS data and store in S3 in Parquet format. Some helpful information here.

AWS
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南